Date
1 - 5 of 5
Error when running JanusGraph with YARN and CQL
Varun Ganesh <operatio...@...>
Hello,
I am trying to run SparkGraphComputer on a JanusGraph backed by Cassandra and ElasticSearch. I have previously verified that I am able to run SparkGraphComputer on a local Spark standalone cluster. I am now trying to run it on YARN. I have a local YARN cluster running and I have verified that it can run Spark jobs. I followed the following links: http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html http://tinkerpop.apache.org/docs/3.4.6/recipes/#olap-spark-yarn And here is my read-cql-yarn.properties file: gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output gremlin.spark.persistContext=true # # JanusGraph Cassandra InputFormat configuration # # These properties defines the connection properties which were used while write data to JanusGraph. janusgraphmr.ioformat.conf.storage.backend=cql # This specifies the hostname & port for Cassandra data store. janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1 janusgraphmr.ioformat.conf.storage.port=9042 # This specifies the keyspace where data is stored. janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph # This defines the indexing backend configuration used while writing data to JanusGraph. janusgraphmr.ioformat.conf.index.search.backend=elasticsearch janusgraphmr.ioformat.conf.index.search.hostname=127.0.0.1 # Use the appropriate properties for the backend when using a different storage backend (HBase) or indexing backend (Solr). # # Apache Cassandra InputFormat configuration # cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner cassandra.input.widerows=true # # SparkGraphComputer Configuration # spark.master=yarn spark.submit.deployMode=client spark.executor.memory=1g spark.yarn.dist.archives=/tmp/spark-gremlin.zip spark.yarn.dist.files=/Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar spark.yarn.appMasterEnv.CLASSPATH=/Users/my_comp/Downloads/hadoop-2.7.2/etc/hadoop:./spark-gremlin.zip/* spark.executor.extraClassPath=/Users/my_comp/Downloads/hadoop-2.7.2/etc/hadoop:/Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar:./spark-gremlin.zip/* spark.driver.extraLibraryPath=/Users/my_comp/Downloads/hadoop-2.7.2/lib/native:/Users/my_comp/Downloads/hadoop-2.7.2/lib/native/Linux-amd64-64 spark.executor.extraLibraryPath=/Users/my_comp/Downloads/hadoop-2.7.2/lib/native:/Users/my_comp/Downloads/hadoop-2.7.2/lib/native/Linux-amd64-64 spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator After a bunch of trial and error, I was able to get it to a point where I see containers starting up on my YARN Resource manager UI (port 8088) Here is the code I am running (it's a simple count): gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-yarn.properties') ==>hadoopgraph[cqlinputformat->nulloutputformat] gremlin> g = graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V().count() 18:49:03 ERROR org.apache.spark.scheduler.TaskSetManager - Task 2 in stage 0.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 10, 192.168.1.160, executor 1): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2862) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1682) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2366) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2290) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2148) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1647) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:483) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:441) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:370) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Would really appricate it if someone could shed some light on this error and advise on next steps! Thank you! |
|
Varun Ganesh <operatio...@...>
An update on this, I tried setting the env var below:
toggle quoted message
Show quoted text
export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/lib After doing this I was able to successfully run the tinkerpop-modern.kryo example from the Recipes documentation. (though the guide at http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html explicitly asks us to ignore this) Unfortunately, it is still not working with CQL. But the error is now different. Please see below: 12:46:33 ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 9, 192.168.1.160, executor 2): java.lang.NoClassDefFoundError: org/janusgraph/hadoop/formats/util/HadoopInputFormat at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) ... (skipping) Caused by: java.lang.ClassNotFoundException: org.janusgraph.hadoop.formats.util.HadoopInputFormat at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 130 more Is there some additional dependency that I may need to add? Thanks in advance! On Wednesday, December 9, 2020 at 11:49:29 PM UTC-5 Varun Ganesh wrote: Hello, |
|
Varun Ganesh <operatio...@...>
Answering my own question. I was able fix the above error and successfully run the count job after explicitly adding /Users/my_comp/Downloads/janusgraph-0.5.2/lib/* to spark.executor.extraClassPath
toggle quoted message
Show quoted text
But I am not yet sure as to why that was needed. I had assumed that adding spark-gremlin.zip to the path would have provided the required dependencies. On Thursday, December 10, 2020 at 1:00:24 PM UTC-5 Varun Ganesh wrote: An update on this, I tried setting the env var below: |
|
HadoopMarc <bi...@...>
Hi Varun, Good job. However, your last solution will only work with everything running on a single machine. So, indeed, there is something wrong with the contents of spark-gremlin.zip or with the way it is put in the executor's local working directory. Note that you already put /Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar explicitly on the executor classpath while it should have been available already through ./spark-gremlin.zip/* O, I think I see now what is different. You have used spark.yarn.dist.archives, while the TinkerPop recipes use spark.yarn.archive. They behave differently in yes/no extracting the jars from the zip. I guess either can be used, provided it is done consistently. You can use the environment tab in Spark web UI to inspect how things are picked up by spark. Best wishes, Marc Op donderdag 10 december 2020 om 20:23:32 UTC+1 schreef Varun Ganesh: Answering my own question. I was able fix the above error and successfully run the count job after explicitly adding /Users/my_comp/Downloads/janusgraph-0.5.2/lib/* to spark.executor.extraClassPath |
|
Varun Ganesh <operatio...@...>
Thanks a lot for responding Marc.
toggle quoted message
Show quoted text
Yes, I had initially tried setting spark.yarn.archive with the path to spark-gremlin.zip. However with this approach, the containers were failing with the message "Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher". I'm yet to understand the differences between the spark.yarn.archive and the HADOOP_GREMLIN_LIBS approaches. Will update this thread as I find out more. Thank you, Varun On Friday, December 11, 2020 at 2:05:35 AM UTC-5 HadoopMarc wrote:
|
|