Re: Unknown compressor type with sparkGraphComputer


Ajay Srivastava <Ajay.Sr...@...>
 

The compression in all the column families is set to gz, which is second entry (id = 1) in Enum of both HBase and janusgraph. So there should not be any problem in running this query.
This query and others work well in embedded mode and gremlin’s oltp graph. Is StringSerializer called only when SparkGraphComputer is used ?

I have also run major_compaction twice in HBase but the problem remains.

Is this a bug ?
Is someone already running SparkGraphComputer with HBase as backend without any problem ?


Regards,
Ajay

On 04-Oct-2017, at 4:41 PM, Ajay Srivastava <ajay.sr...@...> wrote:

Hi,

I am executing gremlin query using SparkGraphComputer and get following exception -

gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured dev-3/192.101.167.171:8182

gremlin> :> olapgraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer).V().count()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Unknown compressor type for id: 2
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer$CompressionType.getFromId(StringSerializer.java:273)
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:104)
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:38)
at org.janusgraph.graphdb.database.serialize.StandardSerializer.readObjectInternal(StandardSerializer.java:236)
at org.janusgraph.graphdb.database.serialize.StandardSerializer.readObject(StandardSerializer.java:224)
at org.janusgraph.graphdb.database.EdgeSerializer.readPropertyValue(EdgeSerializer.java:203)
at org.janusgraph.graphdb.database.EdgeSerializer.readPropertyValue(EdgeSerializer.java:193)
at org.janusgraph.graphdb.database.EdgeSerializer.parseRelation(EdgeSerializer.java:129)
at org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:100)
at org.janusgraph.hadoop.formats.util.GiraphRecordReader.nextKeyValue(GiraphRecordReader.java:60)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:168)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$4.advance(IteratorUtils.java:298)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$4.hasNext(IteratorUtils.java:269)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Here is my olapgraph configuration -

# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=dev-1
janusgraphmr.ioformat.conf.storage.hbase.table=tryJanus

#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer

Is my configuration correct ?
Have I missed setting any property here ?

Regards,
Ajay


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/89F8CB46-D78F-4DB5-ABAF-48F576E98F60%40guavus.com.
For more options, visit https://groups.google.com/d/optout.

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.