I am currently trying to set up SparkGraphComputer using JanusGraph with a CQL storage and ElasticSearch Index backend, and am receiving an error when trying to complete a simple vertex count traversal in the gremlin console:
###########
# Gremlin #
###########
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
###################
# Index - Elastic #
###################
janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
#Elastic Basic Auth
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.basic.password=[password]
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.basic.username=[username]
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.type=[authtype]
#Hosts
janusgraphmr.ioformat.conf.index.search.hostname=[hosts]
janusgraphmr.ioformat.conf.index.search.index-name=[myindexname]
metrics.console.interval=60000
metrics.enabled= false
#################
# Storage - CQL #
#################
schema.default=none
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.batch-loading=true
janusgraphmr.ioformat.conf.storage.buffer-size=10000
janusgraphmr.ioformat.conf.storage.cql.keyspace=[keyspace]
#HOSTS and PORTS
janusgraphmr.ioformat.conf.storage.hostname=[hosts]
janusgraphmr.ioformat.conf.storage.password=[password]
janusgraphmr.ioformat.conf.storage.username=[username]
cassandra.output.native.port=9042
#InputFormat configuration
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true
####################################
# SparkGraphComputer Configuration #
####################################
spark.master=yarn
spark.submit.deployMode=client
#Spark Job Configurations
spark.sql.shuffle.partitions=1000
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
spark.driver.maxResultSize=2G
# Gremlin and Serializer Configuration
gremlin.spark.persistContext=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
spark.executor.extraClassPath=/home/hadoop/janusgraph-full-0.5.3/*
# Special Yarn Configuration (WIP)
spark.yarn.archives=/home/hadoop/janusgraph-full-0.5.3/lib.zip
spark.yarn.jars=/home/hadoop/janusgraph-full-0.5.3/lib/*
spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.dist.archives=/home/hadoop/janusgraph-full-0.5.3/lib.zip
spark.yarn.dist.files=/home/hadoop/janusgraph-full-0.5.3/lib/janusgraph-cql-0.5.3.jar
spark.yarn.shuffle.stopOnFailure=true
#Spark Driver and Executors
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraClassPath=janusgraph-cql-0.5.3.jar:./lib.zip/*:./lib/keys/*
In addition to this - I've also tried the same configuration file and implementation listed steps above on an older AWS EMR release, which was running Hadoop 2.7:
Does anyone have any tips on resolving these issues, or has any experience with implementing JanusGraph 0.5.3 on AWS EMR?