We are having similar issues with performance loading graph data
into Janus backed by HBase. I agree with Jason, we didn't have
any issues with doing all the mgmt calls in one go.
One thing that we did was to multi-thread the java code which
certainly helped performance. HBase seems to respond well to
multiple calls at once. For example, in your loadVerticies
method, you may want to make a thread inside the main for loop and
give it a bank of maybe 32 threads (depends on the machine your're
running on). I use the Java ExecutorService - like:
ExecutorService doWork=
Executors.newFixedThreadPool(MAX_WORK_CALLS);
Semaphore smDoWork= new Semaphore(MAX_WORK_CALLS);
try {
smDoWork.acquire();
} catch (InterruptedException ex) {
log.error("Interrupt: " + ex);
}
someThread= new doJanusStuff(this);
doWork.execute(someThread);
Just make to release the semaphore when the thread is completed.
All that said, performance was then limited by the one machine
doing the ingesting, and still seemed slower than one would
expect. In our case to generate a 154 million node and ~275
million edge graph took 3 days on a 5 node Hadoop cluster.
-Joe
On 10/6/2017 10:30 AM, Jason Plurad
wrote:
toggle quoted message
Show quoted text
Thanks for providing the code. It would be even
better if you shared everything as a GitHub project that's easy
to clone and build, contains the CSV files, and also the
specific parameters you're sending into program, like batchSize.
You didn't mention how slow is slow. What is the ingestion rate
for vertices, properties, and edges? Some more concrete details
would be helpful. What does your HBase deployment look like?
* Note: For unknown reasons, it seems that each
modification to the
* schema must be committed in its own transaction.
I noticed this comment in the code. I don't think that's true,
and
GraphOfTheGodsFactory does all of
its schema updates in one mgmt transaction. I'd be interested to
hear more details on this scenario too.
On Thursday, October 5, 2017 at 10:17:06 AM UTC-4, Michele
Polonioli wrote:
I have JanusGraph using Hbase as backend
storage on an Hadoop cluster.
I need to load a very large quantity of data that
represents a social network graph mapped in csv files.
By now I created a java program that creates the schema
and load verticies and edges using gremlin.
The problem is that this method is very slow.
Is there a way to perform bulk loading into Hbase in
order to significantly reduce the loading times?
The csv files comes out from the ldbc_snb_datage:
https://github.com/ldbc/ldbc_snb_datagen
I'll attach a little portion of the files I need to load and
the java classes that I wrote.
Thanks.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bb4a6e00-b069-4c5b-a87c-77580decde75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.