Re: [BLOG] Configuring JanusGraph for spark-yarn

HadoopMarc <bi...@...>

Hi John,

TinkerPop's spark-gremlin module depends on spark-1.6.1, so when you install spark-gremlin in the gremlin-console or when you add it to your maven project, the spark-core-1.6.1.jar is already on your classpath. The configs in my recipe make sure all deps are also available to the spark-1.6.1 exuctors and application master on a Hadoop alias Yarn cluster. The cluster's spark-1.6.2 jars are never loaded when gremlin-console is used as in my recipe.

Using spark-submit would put spark-1.6.2 on the various classpaths, which would probably also work if it did not cause version conflict between the TinkerPop dependencies and the Spark dependencies.

Also, I believe your implicit assumption that it would be bad practice to put spark-1.6.2 jars on the classpath of a spark-1.6.1 application is not valid. Spark-1.6.2 should support all API's that a Spark-1.6.1 application can depend on (minor version difference).

I hope this clarifies things, configuring complex JVM apps is not for the weak-hearty.


Op dinsdag 26 september 2017 21:58:45 UTC+2 schreef John Helmsen:


This sounds like this could be really good news, but please clarify something for me:

Tinkerpop 3.2.3 claims compatibility with only Spark 1.6.1, and currently JanusGraph-0.1.1 only supports up to Tinkerpop 3.2.3, so I assumed that JanusGraph would only support Spark 1.6.1.

Now I have two interpretations of your post that I need to have clarified:

1) You have made Spark 1.6.2 work (actually do computations) with JanusGraph-0.1.1.
2) There is a version of Spark 1.6.1 also on the cluster, and it is being called by JanusGraph-0.1.1 while Spark 1.6.2 is being ignored.

Either one is a workable option for me, but please elaborate so I am completely clear about what is happening.

On Tuesday, September 26, 2017 at 3:30:40 PM UTC-4, HadoopMarc wrote:

Hi John

The funny thing is, the recipe does not use the HDP Spark installation at all!  SparkGraphComputer creates a SparkContext and has Yarn start all the Spark machinery. So spark versions do not matter at all, though Spark 2.x requires some other config properties (see the recent PR's on github TinkerPop).

The only interaction with the cluster Spark is for the Spark History server, but I did not notice any problems between Spark 1.6.1 and Spark 1.6.2. See your cluster spark-defaults.xml for the history configs.

Have fun!


Op maandag 25 september 2017 23:17:53 UTC+2 schreef John Helmsen:

Thank you so much for the help in getting Spark 1.6.1 to work with JanusGraph.  We've gotten good use out of it, but now we come to a crossroads.

Our customer wants us to deploy it on their cluster, but their cluster runs Spark 1.6.2.  I noticed that you confirmed the operation of the Spark-YARN-JanusGraph on a HDP 2.5 stack, which typically is running 1.6.2.  Does the setup that we've already gone through transfer to 1.6.2?  If there are problems, what could you anticipate that they might be?

On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:

Readers wanting to run OLAP queries on a real spark-yarn cluster might want to check my recent post:

Regards,  Marc

Join to automatically receive all group messages.