Re: Janusgraph spark on yarn error


j2kupper@...
 
Edited

On Tue, Jan 19, 2021 at 10:33 AM, <hadoopmarc@...> wrote:
#Private reply from OP:
Yes, i am running bulk load from hdfs(graphson) in janusgraph-hbase.
Yes, i have graphson part files from spark job with a structure like grateful-dead.json example.

But if application master starting on certain(third) hadoop node is working well.
All nodes have identical configuration.

#Answer HadoopMarc
You do not need to use HadoopGraph for this. Indeed, there used to be a BulkLoaderVertexProgram in Apache TinkerPop, but this could not be maintained and keep working reliably for the various versions of the various graph systems. Until now, JanusGraph does not have developed its own BulkLoaderVertexProgram. Also note that while their does exist an HBaseInputFormat for loading a janusgraph-hbase graph into a HadoopGraph, there does not exist an HBaseOutputFormat to write an HadoopGraph into janusgraph-hbase.

This being said, nothing is lost. You can simply write a spark application that has individual spark executors connect to janusgraph in the usual (OLTP) way and load data with the usual graph.traversal() API, that is using the addV(), addE() and properties() traversal steps. Of course, you could also try and copy the old code for the BulkLoaderVertexProgram into your project, but I believe the way I sketched is conceptually simpler and less error prone.

I tend to remember that their exist some blog series about using JanusGraph at scale, but I do not have then at hand and will look for them later on. If you find these blogs yourself, pleas post the links!

Best wishes,      Marc


 On Tue, Jan 19, 2021 at 10:33 AM, <hadoopmarc@...> wrote:
#Private reply from OP:
Yes, i am running bulk load from hdfs(graphson) in janusgraph-hbase.
Yes, i have graphson part files from spark job with a structure like grateful-dead.json example.

But if application master starting on certain(third) hadoop node is working well.
All nodes have identical configuration.

#Answer HadoopMarc
You do not need to use HadoopGraph for this. Indeed, there used to be a BulkLoaderVertexProgram in Apache TinkerPop, but this could not be maintained and keep working reliably for the various versions of the various graph systems. Until now, JanusGraph does not have developed its own BulkLoaderVertexProgram. Also note that while their does exist an HBaseInputFormat for loading a janusgraph-hbase graph into a HadoopGraph, there does not exist an HBaseOutputFormat to write an HadoopGraph into janusgraph-hbase.

This being said, nothing is lost. You can simply write a spark application that has individual spark executors connect to janusgraph in the usual (OLTP) way and load data with the usual graph.traversal() API, that is using the addV(), addE() and properties() traversal steps. Of course, you could also try and copy the old code for the BulkLoaderVertexProgram into your project, but I believe the way I sketched is conceptually simpler and less error prone.

I tend to remember that their exist some blog series about using JanusGraph at scale, but I do not have then at hand and will look for them later on. If you find these blogs yourself, pleas post the links!

Best wishes,      Marc

Thanks you for response!

I am using BulkLoaderVertexProgram from console. Sometimes it works correctly.
This error still exist when i am running read from hbase spark job.

my  read-hbase.properties

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=false
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=192.168.1.11,192.168.1.12,192.168.1.13,192.168.1.14
janusgraphmr.ioformat.conf.storage.hbase.table=testTable


spark.master=yarn
spark.submit.deployMode=client
spark.yarn.archive=/usr/local/janusgraph/jg_libs.zip
spark.executor.instances=2
spark.driver.memory=8g
spark.driver.cores=4
spark.executor.cores=5
spark.executor.memory=19g

spark.executor.extraClassPath=/usr/local/janusgraph/lib:/usr/local/hadoop/etc/hadoop/conf
spark.executor.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native
spark.yarn.am.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native
spark.yarn.appMasterEnv.CLASSPATH=/usr/local/janusgraph/lib:/usr/local/hadoop/etc/hadoop/conf


spark.driver.extraLibraryPath=/usr/local/hadoop/lib/native
spark.executor.extraLibraryPath=/usr/local/hadoop/lib/native

spark.dynamicAllocation.enabled=false
spark.io.compression.codec=snappy
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Can you provide some code example spark of application loading data OLTP way?
Which program langugage can i use? (I want python, if it`s possible)

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.