Not able to reindex with bigtable as backend


liqingtaobkd@...
 

Hi community,

I am new to Janusgraph and am trying to build a POC. I ran into an issue that I really need some help. I am running Janusgraph on GCP with bigtable as storage backend. I am trying to create. a new vertex centric index. Since I already have some data in the database, I will need to do a re-index. The index is in "REGISTERED" status right now. 
Here are the steps that I took:

mgmt = graph.openManagement()
flowsTo = mgmt.getEdgeLabel('flowsTo')
mgmt.updateIndex(mgmt.getRelationIndex(flowsTo, "flowsToByTimestamp"), SchemaAction.REINDEX).get()
 
I am not sure if there is any way to check the status of the reindex process. I just check my janusgraph/bigtable's cpu/memory, etc, and it shows that the reindex should have finished in half an hour. However the reindex command never returned for hours, and the status is still "registered". If I try to reindex again, it shows
"Another job with the same id is already running: flowsToByTimestamp[flowsTo]"

I don't see any error from server side so I have no clue right now. Any suggestion will be greatly appreciated. Thanks!


hadoopmarc@...
 

Please be sure to run all the steps (including a graph.tx().rollback() before index creation and a mgmt.commit() after update of the index) from the example in:

https://docs.janusgraph.org/index-management/index-performance/#vertex-centric-indexes

Best wishes,   Marc


liqingtaobkd@...
 

Thanks for the reply. I carefully followed each step described in the doc. Before the reindex, I closed all the open transactions and management instance. I sent the reindex command from the console and it never returns (at least for 10h+):
mgmt.updateIndex(mgmt.getRelationIndex(flowsTo, "flowsToByTimestamp"), SchemaAction.REINDEX).get()

So I don't have a chance to commit.

But from my monitoring of janusgraph and bigtable, there is no activity after 30min.

Do you have any further suggestion?


liqingtaobkd@...
 

Found follow error in the log. janusgraph version 0.5.3 with bigtable as backend. Any suggestions pls? I have been stuck with it for a few days...

"org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at org.janusgraph.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:460) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:79) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:94) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91) at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:133) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:196) at org.janusgraph.diskstorage.BackendTransaction.commit(BackendTransaction.java:150) at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1440) at org.janusgraph.graphdb.olap.job.IndexUpdateJob.workerIterationEnd(IndexUpdateJob.java:136) at org.janusgraph.graphdb.olap.job.IndexRepairJob.workerIterationEnd(IndexRepairJob.java:208) at org.janusgraph.graphdb.olap.VertexJobConverter.workerIterationEnd(VertexJobConverter.java:118) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:285) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IllegalStateException: 1 time, servers with issues: bigtable.googleapis.com at com.google.cloud.bigtable.hbase.BatchExecutor.batchCallback(BatchExecutor.java:288) at com.google.cloud.bigtable.hbase.BatchExecutor.batch(BatchExecutor.java:207) at com.google.cloud.bigtable.hbase.AbstractBigtableTable.batch(AbstractBigtableTable.java:185) at org.janusgraph.diskstorage.hbase.HTable1_0.batch(HTable1_0.java:51) at org.janusgraph.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:455) ... 14 more "


liqingtaobkd@...
 

Sorry for multiple emails. My found the final error from Janusgraph. Not sure it's a bigtable issue or a janusgraph/bigtable compatibility issue. Can anybody help to take a look?

3035627 [Thread-8] ERROR org.janusgraph.graphdb.olap.job.IndexRepairJob - Transaction commit threw runtime exception:

org.janusgraph.core.JanusGraphException: Could not commit transaction due to exception during persistence at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1449) at org.janusgraph.graphdb.olap.job.IndexUpdateJob.workerIterationEnd(IndexUpdateJob.java:136) at org.janusgraph.graphdb.olap.job.IndexRepairJob.workerIterationEnd(IndexRepairJob.java:208) at org.janusgraph.graphdb.olap.VertexJobConverter.workerIterationEnd(VertexJobConverter.java:118) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:285) Caused by: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:133) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:196) at org.janusgraph.diskstorage.BackendTransaction.commit(BackendTransaction.java:150) at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1440) ... 4 more Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M40S at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:100) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54) ... 9 more Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at org.janusgraph.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:460) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:79) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:94) at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91) at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68) ... 10 more Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IllegalStateException: 1 time, servers with issues: bigtable.googleapis.com at com.google.cloud.bigtable.hbase.BatchExecutor.batchCallback(BatchExecutor.java:288) at com.google.cloud.bigtable.hbase.BatchExecutor.batch(BatchExecutor.java:207) at com.google.cloud.bigtable.hbase.AbstractBigtableTable.batch(AbstractBigtableTable.java:185) at org.janusgraph.diskstorage.hbase.HTable1_0.batch(HTable1_0.java:51) at org.janusgraph.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:455) ... 14 more


hadoopmarc@...
 

The stacktraces you sent are not from reindexing but from an index repair job. TemporaryBackendException is usually an indication of unbalanced distributed system components; apparently BigTable cannot keep up with your index repair workers. Is it still possible to delete the index and retry from the start?

Otherwise, you could try if reindexing works with just a small graph. There is little to go on right now.

Best wishes,    Marc


hadoopmarc@...
 

I checked on the existing issues and the following one looks similar to your issue:
https://github.com/JanusGraph/janusgraph/issues/1803

There are also some older questions in the janusgraph users list. Only workaround seems to be to define the index before adding the data.

Best wishes,     Marc


liqingtaobkd@...
 

Thanks a lot for your reply Marc. I browsed through the older threads and didn't find a good solution for this. 

"BigTable cannot keep up with your index repair workers" - could you provide a little bit insights for what an index repair job does, or any documentation?
I was trying a few storage settings and didn't get any luck yet: storage.write-time/storage.lock.wait-time/storage.lock.expiry-time/etc. Do you think it will make a difference? 

As you suggested, I'll try delete the index and retry from start.
For our application, we do need to have the option of reindexing current data, so I'll need to make sure it works. Do you see similar issue for Cassandra? We deploy it on GCP so we try Bigtable first.
Do you have any recommendation on backend storage for GCP please?


hadoopmarc@...
 

The vertex centric index is written to the storage backend, so I guess the section on write performance configs should be relevant:
https://docs.janusgraph.org/advanced-topics/bulk-loading/#optimizing-writes-and-reads

If have no idea whether row locking plays a role in writing the vertex centric index. If so, the config properties you mention are relevant and maybe also the config for batch loading, which disables locking:
https://docs.janusgraph.org/advanced-topics/bulk-loading/#batch-loading

Id allocation does not seem relevant (it has its own error messages so you would notice).

Marc


owner.mad.epa@...
 

Reindex process may stuck on ghost vertices, see https://github.com/JanusGraph/janusgraph/issues/1750

Try to remove ghost vertices by GhostVertexRemover

JanusGraphManagement.IndexJobFuture ghostRemover =
graph.getBackend().buildEdgeScanJob()
.setJob(new GhostVertexRemover(graph))
.execute();
try {
logger.info("GhostVertexRemover statistics: {},{},{}",
ghostRemover.get().getCustom(REMOVED_VERTEX_COUNT),
ghostRemover.get().getCustom(REMOVED_RELATION_COUNT),
ghostRemover.get().getCustom(SKIPPED_GHOST_LIMIT_COUNT));
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}