High HBase backend 'configuration' row contention
We are running embedded Janusgraph (0.5.3) with an HBase backend (2.1.6) in our Spark jobs. Each Spark executor creates an instance of Janusgraph. At times there can be over 500 executors running simultaneously. Under those conditions, we observe heavy row contention for the ‘configuration’ row that Janusgraph creates as part of the initialization of the HBase table. Is there any recommendation on how to prevent/reduce this HBase row contention? As the row is only created during HBase initialization and is never updated subsequently, can the data held by the configuration row be moved out of HBase and into a static file?
This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
I do not understand the concept of row contention. Is not this config row just the row that is retrieved most often on the region servers that contain it and are not other rows on these servers served equally slow?
HBase tends to compact tables to a limited number of large size regions (typically 20GB). So, if you have an hdfs replication factor of 3 and your graph has a size of just two regions, at best 6 region servers of your HBase cluster can serve your 500 spark executors.
So, maybe this gives you some hint on what is happening. Or maybe you have more details on how you came to the conclusion that there is such a thing as row contention?
Best wishes, Marc
Thanks for responding back. The configuration row in question, which is created by Janusgraph when the HBase table is first initialized, is having slow read performance due to the simultaneous access by the Spark executors (400+). Again, each executor creates an embedded Janusgraph instance, and we found that the Janusgraph instance accesses the config row every time the JanusGraphFactory’s open() method is called (numerous times per executor). This leads to the executors trying to access this row at the same time and is causing the row to respond back slow. The rest of the other graph data rows do NOT have this latency while reading/writing the graph. I hope that provides some clarification on the issue.
Just one thing came to my mind: did you apply the JanusGraphFactory inside a singleton object so that all tasks from all cores in a spark executor use the same JanusGraph instance? If not, this is an easy change to lower overhead due to connection setups.
Best wishes, Marc
Thanks for the feedback and suggestion. We investigated applying the JanusGraphFactory inside a singleton object as you've suggested, but ran into the issue that the JanusGraphFactory is not serializable as required for Spark singletons. Do you have any ideas of how to get around this issue?
"Not serializable" sounds as if you pass a JanusGraph instance from the Spark driver to the executor. The function that runs on the Spark executor should call some static function on the singleton object that holds the JanusGraph instance. If the singleton object is called for the first time, locally on each Spark executor, it creates the JanusGraph instance and its static convenience method returns a GraphTraversalSource g. If the executor function runs a second time (on the next partition of your RDD or DataFrame as input) it again calls the convenience function on the singleton object, but now gets a GraphTraversalSource returned from the existing JanusGraph instance.
Best wishes, Marc
|1 - 6 of 6|