Re: Data Loading Script Optimization


Vinayak Bali
 

Hi Marc, 

The storage backend used is Cassandra. 
Yes, storage backend janusgraph and load scripts are on the same server.
specified storage.batch-loading=true 
CPU usage is very low not more than 3 percent. The machine has higher hardware configurations. So, I need suggestions on how we can make full use of the hardware.
I will use graph.newTransaction().traversal() replacing line 121 in the code and share the results. 
Current line: g = ctx.g = graph.traversal();
Modified : g = ctx.g = graph.newTransaction().traversal();
Please validate and confirm the changes. 
As data increases, we should use global GraphTraversalSource g at the bottom of the script for the bulk loading.

Thanks & Regards,
Vinayak


On Sat, Aug 7, 2021 at 6:21 PM <hadoopmarc@...> wrote:
Hi Vinayak,

What storage backend do you use? Do I understand right that the storage backend and the load script all run on the same server? If, so, are all available CPU resources actively used during batch loading? What is CPU usage of the groovy process and what of the storage backend?

Specific details in the script:
  • did you specify storage.batch-loading=true
  • I am not sure whether each traversal() call on the graph gets its own thread-independent transaction (that is why ask for the groovy CPU usage). Maybe you need g = graph.newTransaction().traversal() in CsvImporter
  • I assume that the global GraphTraversalSource g at the bottom of the script is not used for the bulk loading.
Best wishes,    Marc

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.