JanugGraph-0.6.0: Unable to open connection JanusGraphFactory with CL=ONE when quorum lost


Umesh Gade
 

Hi,
      We just upgraded janus to 0.6.0 and started observing an issue which was working earlier.
Scenario is, we open a connection with read/write CL="ONE" using JanugGraphFactory. But when quorum is lost, this connection fails to open. Curious to know, what's changed around this and what needs to be done to fix this ?

Graph config passed:
storage.backend=cql
storage.port=9042
storage.cql.keyspace=test_ks
storage.cql.local-datacenter=dc1
storage.cql.read-consistency-level=ONE
storage.cql.write-consistency-level=ONE
storage.cql.executor-service.enabled=false
storage.cql.atomic-batch-mutate=false
graph.set-vertex-id=true
query.force-index=false
query.optimizer-backend-access=false

Below is exception which we got:
Opening connection to graph with test_ks@localhost:9042
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:117)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration.get(KCVSConfiguration.java:96)
        at org.janusgraph.diskstorage.configuration.BasicConfiguration.isFrozen(BasicConfiguration.java:105)
        at org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.buildGlobalConfiguration(ReadConfigurationBuilder.java:81)
        at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:67)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:176)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:147)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:127)
        at ***.TestCli.openConnection(TestCli.java:140)        
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:98)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:52)
        ... 11 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at io.vavr.API$Match$Case0.apply(API.java:5135)
        at io.vavr.API$Match.of(API.java:5092)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:120)
        at org.janusgraph.diskstorage.cql.function.slice.CQLSimpleSliceFunction.interruptibleWait(CQLSimpleSliceFunction.java:50)
        at org.janusgraph.diskstorage.cql.function.slice.CQLSimpleSliceFunction.getSlice(CQLSimpleSliceFunction.java:39)
        at org.janusgraph.diskstorage.cql.function.slice.AbstractCQLSliceFunction.getSlice(AbstractCQLSliceFunction.java:48)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:358)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:99)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:96)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:106)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:120)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:66)
        ... 12 more
Caused by: java.util.concurrent.ExecutionException: com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=localhost/127.0.0.1:9042, hostId=779642c7-23bb-46d4-88fa-6ae08f2f9e24, hashCode=61feb06d): [com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)]
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at org.janusgraph.diskstorage.cql.function.slice.CQLSimpleSliceFunction.interruptibleWait(CQLSimpleSliceFunction.java:45)
        ... 20 more
Caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=localhost/127.0.0.1:9042, hostId=779642c7-23bb-46d4-88fa-6ae08f2f9e24, hashCode=61feb06d): [com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)]
        at com.datastax.oss.driver.api.core.AllNodesFailedException.fromErrors(AllNodesFailedException.java:55)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.sendRequest(CqlRequestHandler.java:261)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$1000(CqlRequestHandler.java:94)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:849)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
        at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
        Suppressed: com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)
--
Sincerely,
Umesh Gade


Umesh Gade
 

One update:
Cassandra version is 4.0.0

On Thu, Dec 2, 2021 at 2:02 PM Umesh Gade via lists.lfaidata.foundation <er.umeshgade=gmail.com@...> wrote:
Hi,
      We just upgraded janus to 0.6.0 and started observing an issue which was working earlier.
Scenario is, we open a connection with read/write CL="ONE" using JanugGraphFactory. But when quorum is lost, this connection fails to open. Curious to know, what's changed around this and what needs to be done to fix this ?

Graph config passed:
storage.backend=cql
storage.port=9042
storage.cql.keyspace=test_ks
storage.cql.local-datacenter=dc1
storage.cql.read-consistency-level=ONE
storage.cql.write-consistency-level=ONE
storage.cql.executor-service.enabled=false
storage.cql.atomic-batch-mutate=false
graph.set-vertex-id=true
query.force-index=false
query.optimizer-backend-access=false

Below is exception which we got:
Opening connection to graph with test_ks@localhost:9042
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:117)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration.get(KCVSConfiguration.java:96)
        at org.janusgraph.diskstorage.configuration.BasicConfiguration.isFrozen(BasicConfiguration.java:105)
        at org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.buildGlobalConfiguration(ReadConfigurationBuilder.java:81)
        at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:67)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:176)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:147)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:127)
        at ***.TestCli.openConnection(TestCli.java:140)        
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:98)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:52)
        ... 11 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at io.vavr.API$Match$Case0.apply(API.java:5135)
        at io.vavr.API$Match.of(API.java:5092)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:120)
        at org.janusgraph.diskstorage.cql.function.slice.CQLSimpleSliceFunction.interruptibleWait(CQLSimpleSliceFunction.java:50)
        at org.janusgraph.diskstorage.cql.function.slice.CQLSimpleSliceFunction.getSlice(CQLSimpleSliceFunction.java:39)
        at org.janusgraph.diskstorage.cql.function.slice.AbstractCQLSliceFunction.getSlice(AbstractCQLSliceFunction.java:48)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:358)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:99)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:96)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:106)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:120)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:66)
        ... 12 more
Caused by: java.util.concurrent.ExecutionException: com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=localhost/127.0.0.1:9042, hostId=779642c7-23bb-46d4-88fa-6ae08f2f9e24, hashCode=61feb06d): [com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)]
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at org.janusgraph.diskstorage.cql.function.slice.CQLSimpleSliceFunction.interruptibleWait(CQLSimpleSliceFunction.java:45)
        ... 20 more
Caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=localhost/127.0.0.1:9042, hostId=779642c7-23bb-46d4-88fa-6ae08f2f9e24, hashCode=61feb06d): [com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)]
        at com.datastax.oss.driver.api.core.AllNodesFailedException.fromErrors(AllNodesFailedException.java:55)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.sendRequest(CqlRequestHandler.java:261)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$1000(CqlRequestHandler.java:94)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:849)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
        at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
        Suppressed: com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)
--
Sincerely,
Umesh Gade



--
Sincerely,
Umesh Gade


hadoopmarc@...
 

Hi Umesh,

I assume that checking the compatibility matrix at https://docs.janusgraph.org/changelog/ made you post the additional comment about cassandra 4.0.0 :-)

Indeed, support of cassandra 4.0 is still an open issue.

Best wishes,    Marc


Umesh Gade
 

Hi Marc,
Thanks for reply. Yes, we are also doing thorough testing to check any compatibility issue and thus so far we haven't found any except this one. Strange thing is that the current issue posted here is NOT there with Janus-0.5.3+Cassandra 4.0 
This issue has something to do with "storage.cql.only-use-local-consistency-for-system-operations". We could solve this issue by setting it to TRUE for 2 node cluster but problem remains with 3+ node cluster.  
Is there any new touchpoint to this flag during JanusFactory.open(...) call added in janusgraph-0.6.0 ?

On Sat, 4 Dec 2021, 17:14 , <hadoopmarc@...> wrote:
Hi Umesh,

I assume that checking the compatibility matrix at https://docs.janusgraph.org/changelog/ made you post the additional comment about cassandra 4.0.0 :-)

Indeed, support of cassandra 4.0 is still an open issue.

Best wishes,    Marc