Re: Cassandra/HBase storage backend issues


Jason Plurad <plu...@...>
 

Hi Mike,

One thing you should watch out for is making sure that your transaction handling is clean. Check out the TinkerPop docs on Graph Transactions, especially the 3rd paragraph. It helps to do a graph.tx().rollback() before running your queries, and then making sure you commit or rollback when you're done in a try/finally block.

Do those traversal use a mixed index? Keep in mind that there is a refresh interval in ES (Solr has something similar), so if you're querying immediately after inserting the data, changes might not be visible yet.

-- Jason


On Monday, June 19, 2017 at 11:09:33 AM UTC-4, HadoopMarc wrote:
Hi Mike,

Seeing no expert answers uptil now, I can only provide a general reply. I see the following lines of thinking in explaining your situation:
  • HBase fails in providing row based consistency: extremely unlikely given the many applications that rely on this
  • JanusGraph fails in providing consistency between instances (e.g. using out of date caches). Do you use multiple JanusGraph instances? Or multiple threads that access the same JanusGraph instance?
  • Your application fails in handling exceptions in the right way (e.g. ignoring them)
  • Your application has logic faults: not so likely because you have been debugging for some while.
If you want to proceed on this, could you provide the code you use on github? So, others can confirm the behavior and/or inspect configs. Ideally, you would provide your code in the form of a:
 https://github.com/JanusGraph/janusgraph/blob/236dd930a7af35061e393ea8bb1ee6eb65f924b2/janusgraph-hbase-parent/janusgraph-hbase-core/src/test/java/org/janusgraph/graphdb/hbase/HBasePartitionGraphTest.java

Other ideas still welcome!

Marc

Op zondag 18 juni 2017 08:38:02 UTC+2 schreef mi...@...:
Hi! I'm running into an issue and wondering if anyone has tips. I'm using HBase (also tried this with cassandra with the same issue) and running into an issue where preprocessing our data yields inconsistent results. We run through a query and for each vertex with a given property, we run a traversal on it and calculate properties or insert edges that weren't inserted on upload to boost performance of our eventual traversal.

Our tests run perfectly with a tinkergraph, but when using HBase or Cassandra backend, sometimes the tests fail, sometimes the calculated properties are completely wrong, and sometimes edges aren't created when needed. A preprocess task may depend on the output of a previous preprocess task that may have taken place seconds earlier. I think this is caused by eventual consistency breaking the traversal, but I'm not sure how to get 100% accuracy (where the current preprocess task can be 100% confident it gets the correct value from a previous preprocessing task). 

I create a transaction for each preprocessing operation, then commit it once successful, but this doesn't seem to fix the issues. Any ideas?

Thanks,
Mike

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.