Re: Calling a SparkGraphComputer from within Spark


robk...@...
 

Hi Marc,

Managed to get this working.  The final code ended up being simple, but the configuration properties you need to set has changed across different versions of JanusGraph/Tinkerpop and finding the right combination was very difficult.  Also strangely I needed to use a GraphFactory, not a JanusGraphFactory.

For Tinkerpop 3.3 with JanusGraph 1.0-SNAPSHOT (tinkerpop 3.3 patched in via pom), and Cassandra 2.2.9, the following code works for me within my existing SparkSession:

val gremlinSpark = Spark.create(spark.sparkContext)
val sparkComputerConnection
= GraphFactory.open(getSparkConfig)
val traversalSource
=
sparkComputerConnection.traversal().withComputer(classOf[SparkGraphComputer])
println(graphTraversalSource.V().count().next())

sparkComputerConnection.close()

def getSparkConfig: BaseConfiguration = {
 
val conf = new BaseConfiguration()
  conf
.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph")
  conf.setProperty("gremlin.hadoop.graphInputFormat", "org.janusgraph.hadoop.formats.cassandra.CassandraInputFormat")
  //    ####################################
  //    # Cassandra Cluster Config         #
  //    ####################################
  conf.setProperty("janusgraphmr.ioformat.conf.storage.backend", "cassandrathrift")
  conf
.setProperty("storage.backend", "cassandra")
  conf
.setProperty("storage.hostname", "127.0.0.1")
  conf
.setProperty("cassandra.input.partitioner.class", "org.apache.cassandra.dht.Murmur3Partitioner")
  //    ####################################
  //    # Spark Configuration              #
  //    ####################################
  conf.setProperty("spark.master", "local[4]")
  conf
.setProperty("gremlin.spark.persistContext", "true")
  conf
.setProperty("spark.serializer", "org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer")

  conf
}




Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.