Marc - thank you for posting this. I'm trying to get this to
work with our CDH 5.10.0 distribution, but have run into an issue;
but first some questions. I'm using a 5 node cluster, and I think
I do not need to set the zookeeper.zone.parent since the hbase
configuration is in /etc/conf/hbase. Is that correct?
The error that I'm getting is:
org.apache.spark.SparkException: Job aborted due to stage
failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
Lost task 1.3 in stage 0.0 (TID 10, host002, executor 1):
java.lang.ClassCastException: cannot assign instance of
java.lang.invoke.SerializedLambda to field
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330
of type org.apache.spark.api.java.function.PairFunction in
instance of
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2238)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2112)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2123)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Given this post:
https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser
It looks like I'm not including a necessary jar, but I'm at a
loss as to which one. Any ideas?
For reference, here is part of the config:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
#janusgraphmr.ioformat.conf.storage.hostname=fqdn1,fqdn2,fqdn3
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=TEST0.2.0
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=18
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
#zookeeper.znode.parent=/hbase-unsecure
# Security configs are needed in case of a secure cluster
#zookeeper.znode.parent=/hbase-secure
#hbase.rpc.protection=privacy
#hbase.security.authentication=kerberos
#
# SparkGraphComputer with Yarn Configuration
#
spark.master=yarn-client
spark.executor.memory=512m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*:/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
Thank you!
-Joe
On 7/6/2017 4:15 AM, HadoopMarc wrote:
toggle quoted messageShow quoted text
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.