Re: Janusgraph Hadoop Spark standalone cluster - Janusgraph job always creates constant number 513 of Spark tasks
marc.d...@...
Hi Dimitar, Your spark screenshots do not show any parallelism. You state that your spark cluster only has a single worker. It seems that this worker also has only one core available (the spark.executor.cores property is not specified, so by default all available worker cores would be available to SparkGraphComputer). Without any parallelism a spark job will never be faster that an application without spark. That being said, I do not understand why a single task takes 2 seconds. Retrieving 4 rows by average from Cassandra should rather take some 40 ms, so we are two orders of speed away from that. Apparently each task has some unexplained overhead, for setting up ........???? I would expect that the spark worker keeps its JVM, and that SparkGraphComputer keeps its classes loaded and its Cassandra connection established between tasks. What I also do not understand is why the different 23 minute jobs are scheduled with a large delay in between. Is the underlying cloud not available? Would that also mean that the vcores used in the spark worker have a very low performance? I would first try some simple spark jobs for a test application (no janusgraph, no cassandra) and be sure that you have a standalone spark cluster that behaves as expected: parallelism visible in the executor tab of the spark UI and no strange waiting periods between jobs of a single application. Cheers, Marc Op maandag 21 oktober 2019 11:23:26 UTC+2 schreef Dimitar Tenev:
|
|