Re: Data Loading Script Optimization


hadoopmarc@...
 

Hi Vinayak,

Good to see some progress!

Some suggestions:
  • Is 40% relative to a single core or to all cores (e.g. CPU usage for a java process in top can be 800% if 8 cores are present)?
  • Ncore * 100% is not necessarily the maximum CPU load of the groovy process + storage backend if the loading becomes IO limited. Can you find out what IO usage is?
  • Do you use CompositeIndices on the properties "name" and "e-mail" for the has() filters?
  • Regarding the idea from Nicolas, I would rather use a ConcurrentMap that maps ORG id's to vertex id's, but only fill it as you go for the ORG's that you add or lookup. The JanusGraph transaction and database caches should be large enough to hold the vertices to be referenced two or more times, thus accommodating g.V(id) lookups.
  • On a single system Apache Spark will not help you.

Best wishes,    Marc

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.