Date
1 - 3 of 3
Janusgraph with YARN and HBASE
Fábio Dapper <fda...@...>
Hello, we have a Cluster with CLOUDERA CDH 6.3.2 and I'm trying to run Janusgraph on the Cluster with YARN and HBASE, but without success.
(it's OK with SPARK Local)
Version SPARK 2.4.2
HBASE: 2.1.0-cdh6.3.2
Janusgraph (v 0.5.2 and v0.4.1)
I did a lot of searching, but I didn't find any recent references, and they all use older versions of SPARK and Janusgraph.
Some examples:
1) https://docs.janusgraph.org/advanced-topics/hadoop/
2) http://tinkerpop.apache.org/docs/current/recipes/#olap-spark-yarn
3) http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html
According to these references, I followed the following steps:
- Copy the following files to the Janusgraph "lib" directory:
- spark-yarn-2.11-2.4.0.jar
- scala-reflect-2.10.5.jar
- hadoop-yarn-server-web-proxy-2.7.2.jar
- guice-servlet-3.0.jar
- Generate a "/tmp/spark-gremlin-0.5.2.zip" file containing all the .jar files from "janusgraph / lib /".
- Create a configuration file called 'test.properties' from “conf/hadoop-graph/read-hbase-standalone-cluster.properties” by adding (or modifying) the properties below:
janusgraphmr.ioformat.conf.storage.hostname=XXX.XXX.XXX.XXX spark.master= yarn #spark.deploy-mode=client spark.submit.deployMode=client spark.executor.memory=1g spark.yarn.dist.jars=/tmp/spark-gremlin-0-5-2.zip
spark.yarn.archive=/tmp/spark-gremlin-0-5-2.zip spark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[hadoop_conf_dir] spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_conf_dir] spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
Then I ran the following commands:
graph = GraphFactory.open(conf/hadoop-graph/test.properties)
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()
Can someone help me?
a) Are these problems related to version incompatibility?
b) Has anyone successfully used similar infrastructure?
c) Would anyone know how to determine a correct version of the necessary libraries?
d) Any suggestion?
Thank you all !!!
Below is a copy of the Yarn Log from my last attempt.
ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, [SERVER_NAME], executor 1): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Thank you!!
Petr Stentor <kiri...@...>
Hi!
Try this
spark.io.compression.codec=snappy
четверг, 23 июля 2020 г., 1:57:38 UTC+3 пользователь Fábio Dapper написал:Hello, we have a Cluster with CLOUDERA CDH 6.3.2 and I'm trying to run Janusgraph on the Cluster with YARN and HBASE, but without success.(it's OK with SPARK Local)Version SPARK 2.4.2HBASE: 2.1.0-cdh6.3.2Janusgraph (v 0.5.2 and v0.4.1)I did a lot of searching, but I didn't find any recent references, and they all use older versions of SPARK and Janusgraph.Some examples:According to these references, I followed the following steps:
- Copy the following files to the Janusgraph "lib" directory:
- spark-yarn-2.11-2.4.0.jar
- scala-reflect-2.10.5.jar
- hadoop-yarn-server-web-proxy-
2.7.2.jar - guice-servlet-3.0.jar
- Generate a "/tmp/spark-gremlin-0.5.2.zip" file containing all the .jar files from "janusgraph / lib /".
- Create a configuration file called 'test.properties' from
“conf/hadoop-graph/read-hbase- standalone-cluster.properties” by adding (or modifying) the properties below:
janusgraphmr.ioformat.conf.storage.hostname=XXX.XXX.XXX. XXX spark.master= yarn#spark.deploy-mode=clientspark.submit.deployMode=clientspark.executor.memory=1gspark.yarn.dist.jars=/tmp/spark-gremlin-0-5-2.zip spark.yarn.archive=/tmp/spark-gremlin-0-5-2.zip spark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[ hadoop_conf_dir] spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_ conf_dir] spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/ hadoop/lib/native Then I ran the following commands:
graph = GraphFactory.open(conf/hadoop-graph/test.properties)
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()Can someone help me?a) Are these problems related to version incompatibility?b) Has anyone successfully used similar infrastructure?c) Would anyone know how to determine a correct version of the necessary libraries?d) Any suggestion?Thank you all !!!Below is a copy of the Yarn Log from my last attempt.
ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, [SERVER_NAME], executor 1): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputS tream.<init>(Ljava/io/InputStr eam;Z)V
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStrea m(CompressionCodec.scala:122)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply (TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply (TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObjec t(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$ readBroadcastBlock$1$$anonfun$ apply$2.apply(TorrentBroadcast .scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$ readBroadcastBlock$1.apply(Tor rentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:13 26)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlo ck(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$ lzycompute(TorrentBroadcast.sc ala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBr oadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(Torrent Broadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:7 0)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap Task.scala:89)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap Task.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.ap ply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala: 1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.s cala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Thank you!!
Fábio Dapper <fda...@...>
Perfect!!!
That's it!
Thank you, very much!!!
Em qui., 23 de jul. de 2020 às 05:20, Petr Stentor <kiri...@...> escreveu:
--
Hi!Try thisspark.io.compression.codec=snappyчетверг, 23 июля 2020 г., 1:57:38 UTC+3 пользователь Fábio Dapper написал:
Hello, we have a Cluster with CLOUDERA CDH 6.3.2 and I'm trying to run Janusgraph on the Cluster with YARN and HBASE, but without success.(it's OK with SPARK Local)Version SPARK 2.4.2HBASE: 2.1.0-cdh6.3.2Janusgraph (v 0.5.2 and v0.4.1)I did a lot of searching, but I didn't find any recent references, and they all use older versions of SPARK and Janusgraph.Some examples:According to these references, I followed the following steps:
- Copy the following files to the Janusgraph "lib" directory:
- spark-yarn-2.11-2.4.0.jar
- scala-reflect-2.10.5.jar
- hadoop-yarn-server-web-proxy-2.7.2.jar
- guice-servlet-3.0.jar
- Generate a "/tmp/spark-gremlin-0.5.2.zip" file containing all the .jar files from "janusgraph / lib /".
- Create a configuration file called 'test.properties' from “conf/hadoop-graph/read-hbase-standalone-cluster.properties” by adding (or modifying) the properties below:
janusgraphmr.ioformat.conf.storage.hostname=XXX.XXX.XXX.XXXspark.master= yarn#spark.deploy-mode=clientspark.submit.deployMode=clientspark.executor.memory=1gspark.yarn.dist.jars=/tmp/spark-gremlin-0-5-2.zipspark.yarn.archive=/tmp/spark-gremlin-0-5-2.zipspark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[hadoop_conf_dir]spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_conf_dir]spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/nativeThen I ran the following commands:
graph = GraphFactory.open(conf/hadoop-graph/test.properties)
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()Can someone help me?a) Are these problems related to version incompatibility?b) Has anyone successfully used similar infrastructure?c) Would anyone know how to determine a correct version of the necessary libraries?d) Any suggestion?Thank you all !!!Below is a copy of the Yarn Log from my last attempt.
ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, [SERVER_NAME], executor 1): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Thank you!!
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/467a21c7-b103-4c1a-9404-a514e4335671o%40googlegroups.com.