Re: Bulk loading into JanusGraph with HBase

Jason Plurad <plu...@...>

Thanks for providing the code. It would be even better if you shared everything as a GitHub project that's easy to clone and build, contains the CSV files, and also the specific parameters you're sending into program, like batchSize.

You didn't mention how slow is slow. What is the ingestion rate for vertices, properties, and edges? Some more concrete details would be helpful. What does your HBase deployment look like?

         * Note: For unknown reasons, it seems that each modification to the
         * schema must be committed in its own transaction.

I noticed this comment in the code. I don't think that's true, and GraphOfTheGodsFactory does all of its schema updates in one mgmt transaction. I'd be interested to hear more details on this scenario too.

On Thursday, October 5, 2017 at 10:17:06 AM UTC-4, Michele Polonioli wrote:
I have JanusGraph using Hbase as backend storage on an Hadoop cluster.

I need to load a very large quantity of data that represents a social network graph mapped in csv files.
By now I created a java program that creates the schema and load verticies and edges using gremlin.

The problem is that this method is very slow.

Is there a way to perform bulk loading into Hbase in order to significantly reduce the loading times?

The csv files comes out from the ldbc_snb_datage:

I'll attach a little portion of the files I need to load and the java classes that I wrote.


Join to automatically receive all group messages.