Re: JanusGraph 0.5.2 and BigTable

Assaf Schwartz <schw...@...>

From time to time, usually after setting up a fresh copy of the BT and Janus, I'll encounter errors relating to locking. However this doesn't happen every time.
Sorry, I can't see to be able copy the logs nicely from the GCP Cloud logging.

org.janusgraph.diskstorage.locking.PermanentLockingException: Local lock contention at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock( at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.acquireLock( at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.acquireLock( at org.janusgraph.diskstorage.BackendTransaction.acquireIndexLock( at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit( at org.janusgraph.graphdb.database.StandardJanusGraph.commit( at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit( at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit( at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit( at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.onTraversalSuccess( at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator( at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$4( at at java.util.concurrent.Executors$ at at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurrent.ThreadPoolExecutor$ at

On Thursday, December 17, 2020 at 10:54:32 AM UTC+2 Assaf Schwartz wrote:
Hi All,

I'm experiencing an issues with running JanusGraph (on top of GKE) against BigTable.
This is the general setup description:
  • We are using a single node BigTable cluster (for development / integration purposes) with the vanilla 0.5.2 docker.
  • Indexing is configured to be done with ES (also running on GKE)
  • JanusGraph is configured through environment variables:
  • Interaction with JanusGraph are done only through a single gRPC server that is running gremlin-python, let's call it DB-SERVER.
  • Last time we've done testing against BT was with version 0.4.1 of JanusGraph, precompiled to support HBase1.
  • All of our components communicate via gRPC.
Description of the problem:
  1. The DB-SERVER creates a Vertex i, generate some XML to represent work to be done, and sends it to another service for processing, let's call in ORCHESTRATOR.
  2. The ORCHESTRATOR generates two properties, w and r (local identifiers) and sends them back to the DB-SERVER, so they will be set as properties on Vertex i. These two properties are also mixed String indexes.
  3. After setting the properties, DB-SERVER will ack ORCHESTRATOR, which will start processing. As part of the processing, ORCHESTRATOR will send updates back to the DB-SERVER using w and r.
  4. On getting these updates DB-SERVER, it will try looking up Vertex i based on w and r, like so:
    g.V().has("r", <some_r>).has("w", <some_w>).next()
  5. At that point, a null / None is returned as the traversal fails to find Vertex i.
  6. Trying the same traversal in a separate console (python and gremlin) does fetch the vertex. Since it's a single instance cluster, I ruled out any eventual consistency issues.
I'm not sure if it's a regression introduced after 0.4.1.
I've also validated that db-caching is turned off.

Help! :)
Many thanks in advance,

Join to automatically receive all group messages.