Janusgraph Hadoop Spark standalone cluster - Janusgraph job always creates constant number 513 of Spark tasks


dimitar....@...
 

Hello,

I have setup Janusgraph 0.4.0 with Hadoop 2.9.0 and Spark 2.4.4 in a K8s cluster.
I connect to Janusgraph from gremlin console and execute: 
gremlin> og
==>graphtraversalsource[hadoopgraph[cassandra3inputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
==>1889

It takes 25min to do the count! The same time took when there were no vertices - e.g. -> 0.  Spark job shows that there were 513 tasks run! Number of task is always constant 513 no matter of the number of vertices.
I have set "spark.sql.shuffle.partitions=4" at spark job's environment, but again the number of Spark tasks was 513! My assumption is that Janusgraph somehow specifies this number of tasks when submits the job to Spark.
The questions are:
- Why Janusgraph job submitted to Spark is always palatalized to 513 tasks? 
- How to manage the number of tasks which are created for a Janusgrap job? 
- How to minimize the execution time of OLAP query for this small graph (OLTP query takes less than a second to execute)?

Thanks,
Dimitar

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.