Re: [BLOG] Configuring JanusGraph for spark-yarn


Joe Obernberger <joseph.o...@...>
 

Marc - thank you for posting this.  I'm trying to get this to work with our CDH 5.10.0 distribution, but have run into an issue; but first some questions.  I'm using a 5 node cluster, and I think I do not need to set the zookeeper.zone.parent since the hbase configuration is in /etc/conf/hbase.  Is that correct?

The error that I'm getting is:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 10, host002, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2238)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2112)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2123)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Given this post:
https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser

It looks like I'm not including a necessary jar, but I'm at a loss as to which one.  Any ideas?

For reference, here is part of the config:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
#janusgraphmr.ioformat.conf.storage.hostname=fqdn1,fqdn2,fqdn3
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=TEST0.2.0
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=18
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
#zookeeper.znode.parent=/hbase-unsecure
# Security configs are needed in case of a secure cluster
#zookeeper.znode.parent=/hbase-secure
#hbase.rpc.protection=privacy
#hbase.security.authentication=kerberos

#
# SparkGraphComputer with Yarn Configuration
#

spark.master=yarn-client
spark.executor.memory=512m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*:/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64

Thank you!

-Joe


On 7/6/2017 4:15 AM, HadoopMarc wrote:


Readers wanting to run OLAP queries on a real spark-yarn cluster might want to check my recent post:

http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Regards,  Marc
--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.

Virus-free. www.avg.com

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.