Re: Janusgraph with YARN and HBASE


Petr Stentor <kiri...@...>
 


Hi!

Try this 
spark.io.compression.codec=snappy

четверг, 23 июля 2020 г., 1:57:38 UTC+3 пользователь Fábio Dapper написал:

Hello, we have a Cluster with CLOUDERA CDH 6.3.2 and I'm trying to run Janusgraph on the Cluster with YARN and HBASE, but without success.
(it's OK with SPARK Local)

Version SPARK 2.4.2
HBASE: 2.1.0-cdh6.3.2
Janusgraph (v 0.5.2 and v0.4.1)

I did a lot of searching, but I didn't find any recent references, and they all use older versions of SPARK and Janusgraph.

Some examples:

According to these references, I followed the following steps:

  1. Copy the following files to the Janusgraph "lib" directory:
    1. spark-yarn-2.11-2.4.0.jar
    2. scala-reflect-2.10.5.jar
    3. hadoop-yarn-server-web-proxy-2.7.2.jar
    4. guice-servlet-3.0.jar
  2. Generate a "/tmp/spark-gremlin-0.5.2.zip" file containing all the .jar files from "janusgraph / lib /".
  3. Create a configuration file called 'test.properties' from conf/hadoop-graph/read-hbase-standalone-cluster.properties by adding (or modifying) the properties below:

        janusgraphmr.ioformat.conf.storage.hostname=XXX.XXX.XXX.XXX 
spark.master= yarn
#spark.deploy-mode=client
spark.submit.deployMode=client
spark.executor.memory=1g
spark.yarn.dist.jars=/tmp/spark-gremlin-0-5-2.zip

spark.yarn.archive=/tmp/spark-gremlin-0-5-2.zip
spark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[hadoop_conf_dir]
spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_conf_dir]
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native 



Then I ran the following commands:
    graph = GraphFactory.open(conf/hadoop-graph/test.properties)
    g
    = graph.traversal().withComputer(SparkGraphComputer)
    g
    .V().count()
Can someone help me?
a) Are these problems related to version incompatibility?
b) Has anyone successfully used similar infrastructure?
c) Would anyone know how to determine a correct version of the necessary libraries?
d) Any suggestion?


Thank you all !!!

 Below is a copy of the Yarn Log from my last attempt.

ERROR org.apache.spark.scheduler.TaskSetManager  - Task 0 in stage 0.0 failed 4 times; aborting job
org
.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, [SERVER_NAME], executor 1): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
at org
.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala
.Option.map(Option.scala:146)
at org
.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala
.Option.getOrElse(Option.scala:121)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org
.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org
.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org
.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org
.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org
.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org
.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org
.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89)
at org
.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org
.apache.spark.scheduler.Task.run(Task.scala:121)
at org
.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org
.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org
.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java
.lang.Thread.run(Thread.java:748)

Thank you!!

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.