GraphTraversal Thread Stuck


Sujay Bothe <ssbothe3@...>
 

Hello,


Janus version -  0.5.3
Cassandra version - 3.11.4


I am facing one issue where the GraphTraversal hasNext() call got stuck .
The thread from which the Traversal was invoked is still stuck and below is the stackTrace for it.


"MYTHREAD" #80 prio=5 os_prio=0 tid=0x00007f74f82f9000 nid=0x1ab9 in Object.wait() [0x00007f746a8ec000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at io.vavr.concurrent.FutureImpl$$Lambda$226/250860313.run(Unknown Source)
        at io.vavr.control.Try.run(Try.java:105)
        at io.vavr.concurrent.FutureImpl.await(FutureImpl.java:114)
        - locked <0x00000000c42f2e60> (a java.lang.Object)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.interruptibleWait(CQLKeyColumnValueStore.java:308)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:289)
        at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:76)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:97)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:94)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration.get(KCVSConfiguration.java:94)
        at org.janusgraph.graphdb.tinkerpop.JanusGraphVariables.get(JanusGraphVariables.java:46)
        at MyObjectStrategy.apply(MyObjectStrategy.java:411)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversalStrategies.applyStrategies(DefaultTraversalStrategies.java:88)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.applyStrategies(DefaultTraversal.java:124)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:196)
        at MyTraversal.get(MyTraversal.java:300)


As you can see above, the CQLKeyColumnValueStore is waiting on the future object. 
It has been stuck for a couple of days now.

I went through the code of CQLKeyColumnValueStore.java of JanusGraph.
It is executing the sliceQuery using the CQLStoreManager.java's executorService threadpool.
The thread names for above thread pool starts with 'CQLStoreManager'


I checked the state of all  20 CQLStoreManager threads in the my java process and all of them are in PARKING state

CQLStoreManager[00]" #191278 daemon prio=5 os_prio=0 tid=0x00007f73f837e000 nid=0x3f0a waiting on condition [0x00007f73cd46c000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c3c26ff8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


So I am not able to understand that what happened to the CQLStoreManager thread which was suppose to update the future object on which
CQLKeyColumnValueStore.interruptibleWait(CQLKeyColumnValueStore.java:308) is waiting.

One more additional information that I want to share is that this has happened around the time at which one of the cassandra nodes was disconnected
and the traversal was failing with TemporaryBackendException (Enough replicas not available) .
But afterwards the node got added to the cluster and quorum was reached , still the thread remained stuck.

Can someone please help here ?
Do let me know if any additional information is needed. 


Thanks,
Sujay Bothe

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.