Janusgraph(GremlinServer) Import improve performance


Srikanth Rachakonda <srikanth....@...>
 


I'm trying to import graph data of 1GB (consists of ~100k vertices, 3.6 million edges) which is gryo format. I tried to import through gremlin-client, I'm getting the following error:

gremlin> graph.io(IoCore.gryo()).readGraph('janusgraph_dump_2020_09_30_local.gryo') GC overhead limit exceeded Type ':help' or ':h' for help. Display stack trace? [yN]y java.lang.OutOfMemoryError: GC overhead limit exceeded at org.cliffc.high_scale_lib.NonBlockingHashMapLong$CHM.(NonBlockingHashMapLong.java:471) at org.cliffc.high_scale_lib.NonBlockingHashMapLong.initialize(NonBlockingHashMapLong.java:241)

Gremlin-Server, Cassandra details as follows:

Gremlin-Server:

Janusgraph Version: 0.5.2 Gremlin Version: 3.4.6

Heap: JAVA_OPTIONS="-Xms4G -Xmx4G …
// gremlin conf
threadPoolWorker: 8
gremlinPool: 16
scriptEvaluationTimeout: 90000
// cql props
query.batch=true
Cassandra is in Cluster with 3 nodes

Cassandra version: 3.11.0

Node1: RAM: 8GB, Cassandra Heap: 1GB (-Xms1G -Xmx1G)
Node2: RAM: 8GB, Cassandra Heap: 4GB (-Xms4G -Xmx4G)
Node3: RAM: 8GB, Cassandra Heap: 4GB (-Xms4G -Xmx4G)

Each node has installed with Gremlin-Server (Load Balancer for clients). But we are executing gremlin queries in Node1.

Can someone help me on the following:

What do I need to do import(any configuration changes) ?

>>> What is the best way to export/import huge data into Janusgraph(Gremlin-Server)? (I need answer for this)

Is there any way I can export the data in chunks and import in chunks ?

Thanks in advance.

Edit:

I've increased Node1, Gremlin-Server Heap to 2GB. Import query response is cancelled. Perhaps, for both Gremlin and Cassandra, RAM allocation is not sufficient. That's why I've kept it to 1GB, so that the query will be executed.

Considering huge data (billions of vertices/edges), this is very less, hope 8GB RAM and 2/4 core would be sufficient for each node in cluster.

Stackoverflow: https://stackoverflow.com/questions/64239361/janusgraphgremlinserver-import-improve-performance


HadoopMarc <bi...@...>
 

Hi Srikanth,

No answers, but just a few remarks that might be helpful:
  • JanusGraph does not support bulk loading other than the directions in the ref docs and the graph and traversalsource API's
  • How did you get the 1GB gryo file in the first place? Apparently, the host that generated it was able to hold the graph in memory. That same host could generate smaller chunks, e.g. using the gremlin subgraph step before doing the IO write
  • The TinkerPop docs advises the following:
    g.io(someInputFile). with(IO.reader, IO.gryo). read().iterate()
    Now, instead of iterate() you can keep nexting as far as your memory goes.
  • Use of gremlin-server is not needed if you use the JVM for bulkloading. I guess many JanusGraph teams do the bulk loading with a lot of java clients and threads with embedded JanusGraph. Apache Spark is often used to run the java clients.
Cheers,     Marc

Op woensdag 7 oktober 2020 om 16:24:34 UTC+2 schreef srik...@...:


I'm trying to import graph data of 1GB (consists of ~100k vertices, 3.6 million edges) which is gryo format. I tried to import through gremlin-client, I'm getting the following error:

gremlin> graph.io(IoCore.gryo()).readGraph('janusgraph_dump_2020_09_30_local.gryo') GC overhead limit exceeded Type ':help' or ':h' for help. Display stack trace? [yN]y java.lang.OutOfMemoryError: GC overhead limit exceeded at org.cliffc.high_scale_lib.NonBlockingHashMapLong$CHM.(NonBlockingHashMapLong.java:471) at org.cliffc.high_scale_lib.NonBlockingHashMapLong.initialize(NonBlockingHashMapLong.java:241)

Gremlin-Server, Cassandra details as follows:

Gremlin-Server:

Janusgraph Version: 0.5.2 Gremlin Version: 3.4.6

Heap: JAVA_OPTIONS="-Xms4G -Xmx4G …
// gremlin conf
threadPoolWorker: 8
gremlinPool: 16
scriptEvaluationTimeout: 90000
// cql props
query.batch=true
Cassandra is in Cluster with 3 nodes

Cassandra version: 3.11.0

Node1: RAM: 8GB, Cassandra Heap: 1GB (-Xms1G -Xmx1G)
Node2: RAM: 8GB, Cassandra Heap: 4GB (-Xms4G -Xmx4G)
Node3: RAM: 8GB, Cassandra Heap: 4GB (-Xms4G -Xmx4G)

Each node has installed with Gremlin-Server (Load Balancer for clients). But we are executing gremlin queries in Node1.

Can someone help me on the following:

What do I need to do import(any configuration changes) ?

>>> What is the best way to export/import huge data into Janusgraph(Gremlin-Server)? (I need answer for this)

Is there any way I can export the data in chunks and import in chunks ?

Thanks in advance.

Edit:

I've increased Node1, Gremlin-Server Heap to 2GB. Import query response is cancelled. Perhaps, for both Gremlin and Cassandra, RAM allocation is not sufficient. That's why I've kept it to 1GB, so that the query will be executed.

Considering huge data (billions of vertices/edges), this is very less, hope 8GB RAM and 2/4 core would be sufficient for each node in cluster.



Stephen Mallette <spmal...@...>
 

I offered some thoughts on this on SO:


On Sat, Oct 10, 2020 at 9:52 AM HadoopMarc <bi...@...> wrote:
Hi Srikanth,

No answers, but just a few remarks that might be helpful:
  • JanusGraph does not support bulk loading other than the directions in the ref docs and the graph and traversalsource API's
  • How did you get the 1GB gryo file in the first place? Apparently, the host that generated it was able to hold the graph in memory. That same host could generate smaller chunks, e.g. using the gremlin subgraph step before doing the IO write
  • The TinkerPop docs advises the following:
    g.io(someInputFile). with(IO.reader, IO.gryo). read().iterate()
    Now, instead of iterate() you can keep nexting as far as your memory goes.
  • Use of gremlin-server is not needed if you use the JVM for bulkloading. I guess many JanusGraph teams do the bulk loading with a lot of java clients and threads with embedded JanusGraph. Apache Spark is often used to run the java clients.
Cheers,     Marc

Op woensdag 7 oktober 2020 om 16:24:34 UTC+2 schreef srik...@...:

I'm trying to import graph data of 1GB (consists of ~100k vertices, 3.6 million edges) which is gryo format. I tried to import through gremlin-client, I'm getting the following error:

gremlin> graph.io(IoCore.gryo()).readGraph('janusgraph_dump_2020_09_30_local.gryo') GC overhead limit exceeded Type ':help' or ':h' for help. Display stack trace? [yN]y java.lang.OutOfMemoryError: GC overhead limit exceeded at org.cliffc.high_scale_lib.NonBlockingHashMapLong$CHM.(NonBlockingHashMapLong.java:471) at org.cliffc.high_scale_lib.NonBlockingHashMapLong.initialize(NonBlockingHashMapLong.java:241)

Gremlin-Server, Cassandra details as follows:

Gremlin-Server:

Janusgraph Version: 0.5.2 Gremlin Version: 3.4.6

Heap: JAVA_OPTIONS="-Xms4G -Xmx4G …
// gremlin conf
threadPoolWorker: 8
gremlinPool: 16
scriptEvaluationTimeout: 90000
// cql props
query.batch=true
Cassandra is in Cluster with 3 nodes

Cassandra version: 3.11.0

Node1: RAM: 8GB, Cassandra Heap: 1GB (-Xms1G -Xmx1G)
Node2: RAM: 8GB, Cassandra Heap: 4GB (-Xms4G -Xmx4G)
Node3: RAM: 8GB, Cassandra Heap: 4GB (-Xms4G -Xmx4G)

Each node has installed with Gremlin-Server (Load Balancer for clients). But we are executing gremlin queries in Node1.

Can someone help me on the following:

What do I need to do import(any configuration changes) ?

>>> What is the best way to export/import huge data into Janusgraph(Gremlin-Server)? (I need answer for this)

Is there any way I can export the data in chunks and import in chunks ?

Thanks in advance.

Edit:

I've increased Node1, Gremlin-Server Heap to 2GB. Import query response is cancelled. Perhaps, for both Gremlin and Cassandra, RAM allocation is not sufficient. That's why I've kept it to 1GB, so that the query will be executed.

Considering huge data (billions of vertices/edges), this is very less, hope 8GB RAM and 2/4 core would be sufficient for each node in cluster.


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/31972982-01a7-42c2-bc20-c1a20ac4971fn%40googlegroups.com.