Performance Improvement


Vinayak Bali
 

Hi All, 

I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: 

Nodes: 5 mins
Edges: 13 mins
Total: 18 mins

Also, the count query including edges takes mins to execute. 
Both Janusgraph(0.5.2) and Cassandra are installed on the same instance.
 
Hardware Configuration:
RAM: 92 GB
Cores: 48 

I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same.

Thanks & Regards,
Vinayak


Boxuan Li
 

Hi Vinayak,

Would you be able to build JanusGraph from master branch and try again? The upcoming 0.6.0 release contains many optimizations which might be helpful. 

Without knowing more details of your use case (your queries, your loading script, your JanusGraph configs, your JanusGraph metrics, your Cassandra metrics), it’s very hard to give any concrete suggestion. Anyway, I would strongly recommend you try out the master version first and see how it goes.

Best,
Boxuan

「Vinayak Bali <vinayakbali16@...>」在 2021年7月26日 週一,下午3:55 寫道:

Hi All, 

I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: 

Nodes: 5 mins
Edges: 13 mins
Total: 18 mins

Also, the count query including edges takes mins to execute. 
Both Janusgraph(0.5.2) and Cassandra are installed on the same instance.
 
Hardware Configuration:
RAM: 92 GB
Cores: 48 

I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same.

Thanks & Regards,
Vinayak


Vinayak Bali
 

Hi Boxuan, 

Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. 

My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. 
The janusgraph and Cassandra configurations are the same which are set as default while installing.

The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. 

Thanks & Regards,
Vinayak

On Mon, Jul 26, 2021 at 1:38 PM Boxuan Li <liboxuan@...> wrote:
Hi Vinayak,

Would you be able to build JanusGraph from master branch and try again? The upcoming 0.6.0 release contains many optimizations which might be helpful. 

Without knowing more details of your use case (your queries, your loading script, your JanusGraph configs, your JanusGraph metrics, your Cassandra metrics), it’s very hard to give any concrete suggestion. Anyway, I would strongly recommend you try out the master version first and see how it goes.

Best,
Boxuan

「Vinayak Bali <vinayakbali16@...>」在 2021年7月26日 週一,下午3:55 寫道:
Hi All, 

I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: 

Nodes: 5 mins
Edges: 13 mins
Total: 18 mins

Also, the count query including edges takes mins to execute. 
Both Janusgraph(0.5.2) and Cassandra are installed on the same instance.
 
Hardware Configuration:
RAM: 92 GB
Cores: 48 

I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same.

Thanks & Regards,
Vinayak


Laura Morales
 

There's a BUILDING file with instructions in the repo.
 
 
 

Sent: Monday, July 26, 2021 at 10:31 AM
From: "Vinayak Bali" <vinayakbali16@gmail.com>
To: janusgraph-users@lists.lfaidata.foundation
Subject: Re: [janusgraph-users] Performance Improvement

Hi Boxuan, 
 
Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. 
 
My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. 
The janusgraph and Cassandra configurations are the same which are set as default while installing.
 
The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. 
 
Thanks & Regards,
Vinayak


Vinayak Bali
 

Laura that is helpful, will go through it and try to implement it. 

Also, if there are any configurations that can be tuned for better performance, please share them.

On Mon, Jul 26, 2021 at 2:22 PM Laura Morales <lauretas@...> wrote:
There's a BUILDING file with instructions in the repo.
 
 
 

Sent: Monday, July 26, 2021 at 10:31 AM
From: "Vinayak Bali" <vinayakbali16@...>
To: janusgraph-users@...
Subject: Re: [janusgraph-users] Performance Improvement

Hi Boxuan, 
 
Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. 
 
My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. 
The janusgraph and Cassandra configurations are the same which are set as default while installing.
 
The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. 
 
Thanks & Regards,
Vinayak 






Oleksandr Porunov
 

Hi Vinayak,

0.6.0 version of JanusGraph is released. I posted some quick tips to improve throughput to your CQL storage here:
https://lists.lfaidata.foundation/g/janusgraph-users/message/6148
I also had a post in LinkedIn with links to relative documentation parts and several better suggestions about internal ExecutorServices usage here: https://www.linkedin.com/posts/porunov_release-060-janusgraphjanusgraph-activity-6840714301062307840-r6Uw

In 0.6.0 you can improve your CQL throughput drastically using a simple configuration `storage.cql.executor-service.enabled: false` which I definitely recommend to do but you should properly configure throughput related configurations.

Best regards,
Oleksandr