Threads are unresponsive for some time after a particular amount of data transfer(119MB)


Vinayak Bali
 

Hi All,

We are connecting to janusgraph using java. A cluster connection with the gremlin driver is used for the connectivity. At the start, we were getting out of memory error, but tweaking some changes in gremlin-server.yaml resolved the issue.
The issue raised on StackOverflow: 


Changes made in gremlin-server.yaml:
writeBufferLowWaterMark: 9500000
writeBufferHighWaterMark: 10000000
Every query gets stuck at 119 MB for some time i.e approx 5 mins and again starts working.
Attaching a screenshot of the error.

Gremlin server configurations:

maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 16384
maxContentLength: 2000000000
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 128
writeBufferLowWaterMark: 9500000
writeBufferHighWaterMark: 10000000
threadPoolWorker: 30
gremlinPool: 0

How can the issue be solved ??

Thanks & Regards,
Vinayak


hadoopmarc@...
 

Hi Vinayak,

As the link shows, the issue is an issue in TinkerPop, so it cannot be solved here. Of course, you can look for workarounds. As sending result sets of multiple hundreds of Mb is not a typical client operation, you might consider opening the graph in embedded mode, that is without using gremlin server.

Best wishes,   Marc


Vinayak Bali
 

Hi Marc,

I went through some blogs but didn't get a method to connect to janusgraph using embedded mode using java. We are using Cassandra as a backend and cql to connect to it. Not sure how I will be achieving the following:
1. Connection to janusgraph from java in embedded mode with data already present in Cassandra(cql).
2. Is there any way to get the data from Cassandra into in-memory??
Please share blogs or other approaches to successfully test the above.

Thanks & Regards,
Vinayak

On Fri, Mar 12, 2021 at 9:38 PM <hadoopmarc@...> wrote:
Hi Vinayak,

As the link shows, the issue is an issue in TinkerPop, so it cannot be solved here. Of course, you can look for workarounds. As sending result sets of multiple hundreds of Mb is not a typical client operation, you might consider opening the graph in embedded mode, that is without using gremlin server.

Best wishes,   Marc


hadoopmarc@...
 

Hi Vinayak,

For embedded use of janusgraph, see:
https://docs.janusgraph.org/getting-started/basic-usage/#loading-with-an-index-backend
and replace the properties file with the one currently used by gremlin server.

With embedded use, you can simply do (if your graph is not too large):
vertices = g.V().toList()
edges = g.E().toList()
subGraph = g.E().subgraph('sub').cap('sub').next()

Best wishes,   Marc


Vinayak Bali
 

Hi Marc,

I am using cluster mode to connect to janusgraph after creating the gremlin query. A sample of code is as follows:
Cluster cluster = Cluster.build().addContactPoint("xx.xx.xx.xx")
        .port(8182)
        .serializer(serializer)
        .resultIterationBatchSize(512)
        .maxContentLength(maxContentLength)
        .create();
Client connect = cluster.connect();
ResultSet submit = connect.submit(gremlin, options);
I went through many blogs, but not any useful information to run it in embedded mode. Also, is there any way to load data from the backend to in-memory to speed up performance?
Request you to guide me to solve the issues:
1. Connecting in embedded mode to bypass the gremlin driver issue.
2. Loading up data in backend into in-memory to speed up performance. 

Thanks & Regards,
Vinayak

On Mon, Mar 15, 2021 at 12:21 PM <hadoopmarc@...> wrote:
Hi Vinayak,

For embedded use of janusgraph, see:
https://docs.janusgraph.org/getting-started/basic-usage/#loading-with-an-index-backend
and replace the properties file with the one currently used by gremlin server.

With embedded use, you can simply do (if your graph is not too large):
vertices = g.V().toList()
edges = g.E().toList()
subGraph = g.E().subgraph('sub').cap('sub').next()

Best wishes,   Marc


hadoopmarc@...
 

Hi Vinayak,

From gremlin console:

gremlin> graph = JanusGraphFactory.open('conf/your_config_with_right_keyspace.properties')
==>standardjanusgraph[cassandra:]
gremlin> g = graph.traversal()
vertices = g.V().toList()


Vinayak Bali
 

Hi Marc,

That is where the problem lies. I am not using the gremlin console. Want to execute queries through API built-in Java.

Thanks & Regards,
Vinayak

On Tue, Mar 16, 2021 at 1:54 PM <hadoopmarc@...> wrote:
Hi Vinayak,

From gremlin console:

gremlin> graph = JanusGraphFactory.open('conf/your_config_with_right_keyspace.properties')
==>standardjanusgraph[cassandra:]
gremlin> g = graph.traversal()
vertices = g.V().toList()


Boxuan Li
 

Hi Vinayak,


Using Java APIs is not that different from using the gremlin console where you write groovy code.

Cheers,
Boxuan

On Mar 16, 2021, at 5:24 PM, Vinayak Bali <vinayakbali16@...> wrote:

Hi Marc,

That is where the problem lies. I am not using the gremlin console. Want to execute queries through API built-in Java.

Thanks & Regards,
Vinayak

On Tue, Mar 16, 2021 at 1:54 PM <hadoopmarc@...> wrote:
Hi Vinayak,

From gremlin console:

gremlin> graph = JanusGraphFactory.open('conf/your_config_with_right_keyspace.properties')
==>standardjanusgraph[cassandra:]
gremlin> g = graph.traversal()
vertices = g.V().toList()