Date
1 - 2 of 2
ClassNotFoundException running Gremlin on Spark
borde...@...
Hello,
I'm attempting to transition from Titan to JanusGraph 0.1.0 and am having problems getting OLAP queries to work via Spark. I've loaded a graph with about 2 million vertices and tried to execute a simple count:
gremlin> graph = GraphFactory.open('janusgraph-olap.properties')
gremlin> g = graph.traversal(computer(SparkGraphComputer))
gremlin> g.V().count()
The job soon fails with "java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer", which I know is in spark-gremlin-3.2.3.jar. This appears to happen before the Spark executor has a chance to start. I tried adding this jar to spark.executor.extraClassPath, but it didn't help. Does HADOOP_GREMLIN_LIBS come into play? I've tried fiddling with it but to no avail.
I'm using HBase 1.1.2.2.5.3.0-37 and Spark 1.6 on HDP 2.5.3.0.
OLTP Gremlin queries work ok.
This was working fine using Titan.
Here's my properties file:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=dummyoutput
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=10.1.1.1,10.1.1.2,10.1.1.3
janusgraphmr.ioformat.conf.storage.port=2181
storage.backend=hbase
storage.hostname=10.1.1.1,10.1.1.2,10.1.1.3
storage.port=2181
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
spark.master=yarn-client
spark.shuffle.service.enabled=true
spark.dynamicAllocation.enabled=true
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.3.0-37
This was working fine using Titan.
Thanks,
Jerrell
Jason Plurad <plu...@...>
A similar message came up on the gremlin-users mailing list. You might want to compare notes with that thread.
https://groups.google.com/d/msg/gremlin-users/LYv-cvZ66hU/vqZJD4OzBQAJ
https://groups.google.com/d/msg/gremlin-users/LYv-cvZ66hU/vqZJD4OzBQAJ
On Wednesday, May 17, 2017 at 1:12:16 AM UTC-4, Jerrell Schivers wrote:
Hello,I'm attempting to transition from Titan to JanusGraph 0.1.0 and am having problems getting OLAP queries to work via Spark. I've loaded a graph with about 2 million vertices and tried to execute a simple count:
gremlin> graph = GraphFactory.open('janusgraph-olap.properties')
gremlin> g = graph.traversal(computer(SparkGraphComputer))
gremlin> g.V().count()The job soon fails with "java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin. spark.structure.io.gryo. GryoSerializer", which I know is in spark-gremlin-3.2.3.jar. This appears to happen before the Spark executor has a chance to start. I tried adding this jar to spark.executor.extraClassPath, but it didn't help. Does HADOOP_GREMLIN_LIBS come into play? I've tried fiddling with it but to no avail.
I'm using HBase 1.1.2.2.5.3.0-37 and Spark 1.6 on HDP 2.5.3.0.OLTP Gremlin queries work ok.Here's my properties file:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure. HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.format s.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremli n.hadoop.structure.io.gryo.Gry oOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduc e.lib.output.SequenceFileOutpu tFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=dummyoutput
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=10.1.1.1,10.1.1. 2,10.1.1.3
janusgraphmr.ioformat.conf.storage.port=2181
storage.backend=hbase
storage.hostname=10.1.1.1,10.1.1.2,10.1.1.3
storage.port=2181
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
spark.master=yarn-client
spark.shuffle.service.enabled=true
spark.dynamicAllocation.enabled=true
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.3.0-37
This was working fine using Titan.Thanks,Jerrell