JanusGraph 0.5.3 SparkGraph Computer with YARN Error - java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetNewApplicationRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google.protobuf.Message


kndoan94@...
 
Edited

Hello!

I am currently trying to set up SparkGraphComputer using JanusGraph with a CQL storage and ElasticSearch Index backend, and am receiving an error when trying to complete a simple vertex count traversal in the gremlin console:

gremlin> hadoop_graph = GraphFactory.open('conf/hadoop-graph/olap/olap-cassandra-HadoopGraph-YARN.properties')
gremlin> hg = hadoop_graph.traversal().withComputer(SparkGraphComputer)
gremlin> hg.V().count()
>> ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.
java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetNewApplicationRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google. protobuf.Message
Relevant cluster details
  • JanusGraph Version 0.5.3
  • Spark-Gremlin Version 3.4.6 
  • AWS EMR Release 5.23.0
  • Spark Version 2.4.0
  • Hadoop 2.8.5
  • Cassandra/CQL version 3.11.10 
Implementation Details
Properties File
###########
# Gremlin #
###########

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output


###################
# Index - Elastic #
###################

janusgraphmr.ioformat.conf.index.search.backend=elasticsearch

#Elastic Basic Auth
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.basic.password=[password]
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.basic.username=[username]
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.type=[authtype]


#Hosts
janusgraphmr.ioformat.conf.index.search.hostname=[hosts]
janusgraphmr.ioformat.conf.index.search.index-name=[myindexname]

metrics.console.interval=60000
metrics.enabled= false


#################
# Storage - CQL #
#################

schema.default=none

janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.batch-loading=true
janusgraphmr.ioformat.conf.storage.buffer-size=10000
janusgraphmr.ioformat.conf.storage.cql.keyspace=[keyspace]


#HOSTS and PORTS
janusgraphmr.ioformat.conf.storage.hostname=[hosts]
janusgraphmr.ioformat.conf.storage.password=[password]
janusgraphmr.ioformat.conf.storage.username=[username]
cassandra.output.native.port=9042

#InputFormat configuration
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true

####################################
# SparkGraphComputer Configuration #
####################################
spark.master=yarn
spark.submit.deployMode=client

#Spark Job Configurations
spark.sql.shuffle.partitions=1000
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
spark.driver.maxResultSize=2G

# Gremlin and Serializer Configuration
gremlin.spark.persistContext=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
spark.executor.extraClassPath=/home/hadoop/janusgraph-full-0.5.3/*

# Special Yarn Configuration (WIP)
spark.yarn.archives=/home/hadoop/janusgraph-full-0.5.3/lib.zip
spark.yarn.jars=/home/hadoop/janusgraph-full-0.5.3/lib/*
spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.dist.archives=/home/hadoop/janusgraph-full-0.5.3/lib.zip
spark.yarn.dist.files=/home/hadoop/janusgraph-full-0.5.3/lib/janusgraph-cql-0.5.3.jar
spark.yarn.shuffle.stopOnFailure=true


#Spark Driver and Executors
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraClassPath=janusgraph-cql-0.5.3.jar:./lib.zip/*:./lib/keys/*


In addition to this - I've also tried the same configuration file and implementation listed steps above on an older AWS EMR release, which was running Hadoop 2.7:
  • JanusGraph Version 0.5.3
  • Spark-Gremlin Version 3.4.6 
  • AWS EMR Release 5.11.4
  • Spark Version 2.2.1
  • Hadoop 2.7.3
  • Cassandra/CQL version 3.11.10 

And received a similar error in the gremlin console:

java.lang.IllegalStateException: java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetNewApplicationRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google.protobuf.Message



Does anyone have any tips on resolving these issues, or has any experience with implementing JanusGraph 0.5.3 on AWS EMR?

Thanks!

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.