Re: Janus Graph performing OLAP with Spark/Yarn


I think there are many success stories/snippets out there on this but no consolidated how-to that I'm aware of. Marc I'm pretty sure I've seen plenty of examples from you on this across various lists over the years. I can contribute a couple examples as well if we can get some documentation started on this under JanusGraph. I've had success getting traversals and vertex programs working using Titan SparkGraphComputer with HBase using both TinkerPop-3.0.1/Spark-1.2 (Yarn/Cloudera) and TinkerPop-3.2.3/Spark-1.6 (Yarn/Cloudera and Mesos). But haven't tested this out with JanusGraph yet. Personally I'd recommend you consider running Spark on Mesos instead of Yarn if possible. The configuration is easier in my opinion and you can have apps running against different versions of Spark, making hardware and software updates much easier and less disruptive.

A few notes in case helpful: First is probably obvious but I always match the server Spark version exactly to the Spark version in TinkerPop from the relevant JanusGraph/Titan distribution. Also I've found the spark.executor.extraClassPath property to be crucial to getting things working both with Yarn and Mesos. Jars included there will be at the start of the classpath, which is important when the cluster may have conflicting versions of core/transitive dependencies. I'll usually create a single jar with all dependencies (excluding Spark), put it somewhere accessible on all cluster nodes and then define spark.executor.extraClassPath pointing to same.

