High HBase backend 'configuration' row contention


Tendai Munetsi
 

Hi,

 

We are running embedded Janusgraph (0.5.3) with an HBase backend (2.1.6) in our Spark jobs. Each Spark executor creates an instance of Janusgraph. At times there can be over 500 executors running simultaneously. Under those conditions, we observe heavy row contention for the ‘configuration’ row that Janusgraph creates as part of the initialization of the HBase table. Is there any recommendation on how to prevent/reduce this HBase row contention? As the row is only created during HBase initialization and is never updated subsequently, can the data held by the configuration row be moved out of HBase and into a static file?

 

Thanks,

Tendai


This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.



hadoopmarc@...
 

Hi Tendai,

I do not understand the concept of row contention. Is not this config row just the row that is retrieved most often on the region servers that contain it and are not other rows on these servers served equally slow?

HBase tends to compact tables to a limited number of large size regions (typically 20GB). So, if you have an hdfs replication factor of 3 and your graph has a size of just two regions, at best 6 region servers of your HBase cluster can serve your 500 spark executors.

So, maybe this gives you some hint on what is happening. Or maybe you have more details on how you came to the conclusion that there is such a thing as row contention?

Best wishes,    Marc


Tendai Munetsi
 

Hi Marc,

Thanks for responding back. The configuration row in question, which is created by Janusgraph when the HBase table is first initialized, is having slow read performance due to the simultaneous access by the Spark executors (400+). Again, each executor creates an embedded Janusgraph instance, and we found that the Janusgraph instance accesses the config row every time the JanusGraphFactory’s open() method is called (numerous times per executor). This leads to the executors trying to access this row at the same time and is causing the row to respond back slow. The rest of the other graph data rows do NOT have this latency while reading/writing the graph. I hope that provides some clarification on the issue.

Regards,
Tendai


hadoopmarc@...
 

Hi Tendai,

Just one thing came to my mind: did you apply the JanusGraphFactory inside a singleton object so that all tasks from all cores in a spark executor use the same JanusGraph instance? If not, this is an easy change to lower overhead due to connection setups.

Best wishes,   Marc