Janusgraph spark on yarn error
j2kupper@...
Hi! |
|
hadoopmarc@...
Hi
OK, do I understand right that you want to bulk load data from hdfs into janusgraph-hbase? Nothing wrong with that requirement, I do not know how to ask this in a more friendly way! Is your input data really in GraphSON format? (it is difficult to get this right!) With that established, we can see further, because this is a broad subject. Marc |
|
hadoopmarc@...
#Private reply from OP:
Yes, i am running bulk load from hdfs(graphson) in janusgraph-hbase. Yes, i have graphson part files from spark job with a structure like grateful-dead.json example. But if application master starting on certain(third) hadoop node is working well. All nodes have identical configuration. #Answer HadoopMarc You do not need to use HadoopGraph for this. Indeed, there used to be a BulkLoaderVertexProgram in Apache TinkerPop, but this could not be maintained and keep working reliably for the various versions of the various graph systems. Until now, JanusGraph does not have developed its own BulkLoaderVertexProgram. Also note that while their does exist an HBaseInputFormat for loading a janusgraph-hbase graph into a HadoopGraph, there does not exist an HBaseOutputFormat to write an HadoopGraph into janusgraph-hbase. This being said, nothing is lost. You can simply write a spark application that has individual spark executors connect to janusgraph in the usual (OLTP) way and load data with the usual graph.traversal() API, that is using the addV(), addE() and properties() traversal steps. Of course, you could also try and copy the old code for the BulkLoaderVertexProgram into your project, but I believe the way I sketched is conceptually simpler and less error prone. I tend to remember that their exist some blog series about using JanusGraph at scale, but I do not have then at hand and will look for them later on. If you find these blogs yourself, pleas post the links! Best wishes, Marc |
|
Thank you for response!
I am using BulkLoaderVertexProgram from console. Sometimes it works correctly. This error still exist when i am running read from hbase spark job. my read-hbase.properties gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat gremlin.hadoop.jarsInDistributedCache=false gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output janusgraphmr.ioformat.conf.storage.backend=hbase janusgraphmr.ioformat.conf.storage.hostname=192.168.1.11,192.168.1.12,192.168.1.13,192.168.1.14 janusgraphmr.ioformat.conf.storage.hbase.table=testTable spark.master=yarn spark.submit.deployMode=client spark.yarn.archive=/usr/local/janusgraph/janusgraph_libs.zip spark.executor.instances=2 spark.driver.memory=8g spark.driver.cores=4 spark.executor.cores=5 spark.executor.memory=19g spark.executor.extraClassPath=/usr/local/janusgraph/lib:/usr/local/hadoop/etc/hadoop/conf spark.executor.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native spark.yarn.am.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native spark.yarn.appMasterEnv.CLASSPATH=/usr/local/janusgraph/lib:/usr/local/hadoop/etc/hadoop/conf spark.driver.extraLibraryPath=/usr/local/hadoop/lib/native spark.executor.extraLibraryPath=/usr/local/hadoop/lib/native spark.dynamicAllocation.enabled=false spark.io.compression.codec=snappy spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator Can you provide some code example spark of application loading data OLTP way? Which program langugage can i use? (I want python, if it`s possible) |
|
hadoopmarc@...
The path of the BulkLoaderVertexProgram might be doable, but I cannot help you on that one. In the stack trace above, the yarn appmaster from spark-yarn apparently tries to communicate with HBase but finds that various libraries do not match. This failure arises because the JanusGraph distribution does not include spark-yarn and thus is not handcrafted to work with spark-yarn.
For the path without BulkLoaderVertexProgram you inevitably need a JVM language (java, scala, groovy). In this case, a spark executor is unaware of any other executors running and is simply passed a callable (function) to execute (through RDD.mapPartitions() or through a spark-sql UDF). This callable can be part of a class that establish its own JanusGraph instances in the OLTP way. Now, you only have to deal with the executor CLASSPATH which does not need spark-yarn and the libs from the janusgraph distribution suffice. Some example code can be found at: https://nitinpoddar.medium.com/bulk-loading-data-into-janusgraph-part-2-ca946db26582 Best wishes, Marc |
|
j2kupper@...
Thank you!
Sorry for my long time answer. I am do some experiments with janusgraph. |
|