Date
1 - 5 of 5
Unknown compressor type with sparkGraphComputer
Ajay Srivastava <Ajay.Sr...@...>
Hi,
I am executing gremlin query using SparkGraphComputer and get following exception -
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured dev-3/192.101.167.171:8182
gremlin> :> olapgraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer).V().count()
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException:
Unknown compressor type for id: 2
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer$CompressionType.getFromId(StringSerializer.java:273)
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:104)
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:38)
at org.janusgraph.graphdb.database.serialize.StandardSerializer.readObjectInternal(StandardSerializer.java:236)
at org.janusgraph.graphdb.database.serialize.StandardSerializer.readObject(StandardSerializer.java:224)
at org.janusgraph.graphdb.database.EdgeSerializer.readPropertyValue(EdgeSerializer.java:203)
at org.janusgraph.graphdb.database.EdgeSerializer.readPropertyValue(EdgeSerializer.java:193)
at org.janusgraph.graphdb.database.EdgeSerializer.parseRelation(EdgeSerializer.java:129)
at org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:100)
at org.janusgraph.hadoop.formats.util.GiraphRecordReader.nextKeyValue(GiraphRecordReader.java:60)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:168)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$4.advance(IteratorUtils.java:298)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$4.hasNext(IteratorUtils.java:269)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here is my olapgraph configuration -
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=dev-1
janusgraphmr.ioformat.conf.storage.hbase.table=tryJanus
#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
Is my configuration correct ?
Have I missed setting any property here ?
Regards,
Ajay
|
|
Ajay Srivastava <Ajay.Sr...@...>
The compression in all the column families is set to gz, which is second entry (id = 1) in Enum of both HBase and janusgraph. So there should not be any problem in running this query.
This query and others work well in embedded mode and gremlin’s oltp graph. Is StringSerializer called only when SparkGraphComputer is used ?
I have also run major_compaction twice in HBase but the problem remains.
Is this a bug ?
Is someone already running SparkGraphComputer with HBase as backend without any problem ?
Regards,
Ajay
|
|
marc.d...@...
Hi Ajay, Never tried this myself. Could you post the remote.yaml and gremlin-server.yaml files too (if they deviate from the distribution ones)? This way others will easier recognize serialization config problems. Cheers, Marc Op woensdag 4 oktober 2017 13:11:30 UTC+2 schreef Ajay Srivastava:
|
|
Ajay Srivastava <Ajay.Sr...@...>
Hi Marc,
gremlin-server.yaml —>
host: dev-3
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
graph: conf/gremlin-server/socket-janusgraph-hbase-server.properties,
olapgraph: conf/hadoop-graph/read-hbase.properties }
plugins:
- janusgraph.imports
scriptEngines: {
gremlin-groovy: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI],
scripts: [scripts/empty-sample.groovy]}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
remote.yaml —>
hosts: [dev-3]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
Regards,
Ajay
|
|
marc.d...@...
Hi Ajay, I have no idea what is happening here, but since you use gremlin console, you could try the remote-object.yaml as an alternative. Any one else? Marc Op donderdag 5 oktober 2017 13:59:35 UTC+2 schreef Ajay Srivastava:
|
|