olap connection with spark standalone cluster


Lilly <lfie...@...>
 

Hi everyone,

I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark://127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh.
I checked that spark works with the provided example programs.

I am further using the janusgraph-0.4.0-hadoop2 binary.

Now I configured the read-cassandra-3.properties as follows:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.master=spark://127.0.0.1:7077
spark.executor.memory=8g
spark.executor.extraClassPath=/home/janusgraph-0.4.0-hadoop2/lib/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

where the janusgraph libraries are stored in /home/janusgraph-0.4.0-hadoop2/lib/*

In my java application I now tried
Graph graph = GraphFactory.open('...')
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
and then g.V().count().next()
I get the error message:
ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job
Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267

Any ideas as to what might be the problem?
Thanks!
Lilly



marc.d...@...
 

Hi Lilly,

This error says that are somehow two versions of the TinkerPop jars in your project. If you use maven you check this with the dependency plugin.

If other problems appear, also be sure that the spark cluster is doing fine by running one of the examples from the spark distribution with spark-submit.

HTH,    Marc

Op dinsdag 15 oktober 2019 09:38:08 UTC+2 schreef Lilly:

Hi everyone,

I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark://127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh.
I checked that spark works with the provided example programs.

I am further using the janusgraph-0.4.0-hadoop2 binary.

Now I configured the read-cassandra-3.properties as follows:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.master=spark://127.0.0.1:7077
spark.executor.memory=8g
spark.executor.extraClassPath=/home/janusgraph-0.4.0-hadoop2/lib/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

where the janusgraph libraries are stored in /home/janusgraph-0.4.0-hadoop2/lib/*

In my java application I now tried
Graph graph = GraphFactory.open('...')
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
and then g.V().count().next()
I get the error message:
ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job
Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267

Any ideas as to what might be the problem?
Thanks!
Lilly



Abhay Pandit <abha...@...>
 

Hi Lilly,

SparkGraphComputer will not support direct gremlin queries using Java programs.
You can try using this as below.

String query = "g.V().count()";
ComputerResult result = graph.compute(SparkGraphComputer.class)

            .result(GraphComputer.ResultGraph.NEW)

            .persist(GraphComputer.Persist.EDGES)

            .program(TraversalVertexProgram.build()

                    .traversal(

                            graph.traversal().withComputer(SparkGraphComputer.class),

                            "gremlin-groovy",

                            query)

                    .create(graph))

            .submit()

            .get();

System.out.println( computerResult.memory().get("gremlin.traversalVertexProgram.haltedTraversers"));


Join my facebook group: https://www.facebook.com/groups/Janusgraph/

Thanks,
Abhay

On Tue, 15 Oct 2019 at 19:25, <marc.d...@...> wrote:
Hi Lilly,

This error says that are somehow two versions of the TinkerPop jars in your project. If you use maven you check this with the dependency plugin.

If other problems appear, also be sure that the spark cluster is doing fine by running one of the examples from the spark distribution with spark-submit.

HTH,    Marc

Op dinsdag 15 oktober 2019 09:38:08 UTC+2 schreef Lilly:
Hi everyone,

I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark://127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh.
I checked that spark works with the provided example programs.

I am further using the janusgraph-0.4.0-hadoop2 binary.

Now I configured the read-cassandra-3.properties as follows:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.master=spark://127.0.0.1:7077
spark.executor.memory=8g
spark.executor.extraClassPath=/home/janusgraph-0.4.0-hadoop2/lib/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

where the janusgraph libraries are stored in /home/janusgraph-0.4.0-hadoop2/lib/*

In my java application I now tried
Graph graph = GraphFactory.open('...')
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
and then g.V().count().next()
I get the error message:
ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job
Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267

Any ideas as to what might be the problem?
Thanks!
Lilly


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/e7336651-4265-4508-985b-64ed53935fff%40googlegroups.com.


Lilly <lfie...@...>
 

Hi Marc,

Great this dependency plugin is precisely what I needed!! I tried to manually figure this out via maven central but one goes crazy that way!
It now works perfect thanks so much!

Lilly

Am Dienstag, 15. Oktober 2019 15:55:58 UTC+2 schrieb ma...@...:

Hi Lilly,

This error says that are somehow two versions of the TinkerPop jars in your project. If you use maven you check this with the dependency plugin.

If other problems appear, also be sure that the spark cluster is doing fine by running one of the examples from the spark distribution with spark-submit.

HTH,    Marc

Op dinsdag 15 oktober 2019 09:38:08 UTC+2 schreef Lilly:
Hi everyone,

I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark://127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh.
I checked that spark works with the provided example programs.

I am further using the janusgraph-0.4.0-hadoop2 binary.

Now I configured the read-cassandra-3.properties as follows:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.master=spark://127.0.0.1:7077
spark.executor.memory=8g
spark.executor.extraClassPath=/home/janusgraph-0.4.0-hadoop2/lib/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

where the janusgraph libraries are stored in /home/janusgraph-0.4.0-hadoop2/lib/*

In my java application I now tried
Graph graph = GraphFactory.open('...')
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
and then g.V().count().next()
I get the error message:
ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job
Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267

Any ideas as to what might be the problem?
Thanks!
Lilly



Lilly <lfie...@...>
 

Hi Abhay,

It seems to work fine now unless I am overseeing something. Why do you think it should not?
It also worked beforehand on using the spark.master=local setting.

Thanks,
Lilly

Am Mittwoch, 16. Oktober 2019 08:26:47 UTC+2 schrieb Abhay Pandit:

Hi Lilly,

SparkGraphComputer will not support direct gremlin queries using Java programs.
You can try using this as below.

String query = "g.V().count()";
ComputerResult result = graph.compute(SparkGraphComputer.class)

            .result(GraphComputer.ResultGraph.NEW)

            .persist(GraphComputer.Persist.EDGES)

            .program(TraversalVertexProgram.build()

                    .traversal(

                            graph.traversal().withComputer(SparkGraphComputer.class),

                            "gremlin-groovy",

                            query)

                    .create(graph))

            .submit()

            .get();

System.out.println( computerResult.memory().get("gremlin.traversalVertexProgram.haltedTraversers"));


Join my facebook group: https://www.facebook.com/groups/Janusgraph/

Thanks,
Abhay

On Tue, 15 Oct 2019 at 19:25, <ma...@...> wrote:
Hi Lilly,

This error says that are somehow two versions of the TinkerPop jars in your project. If you use maven you check this with the dependency plugin.

If other problems appear, also be sure that the spark cluster is doing fine by running one of the examples from the spark distribution with spark-submit.

HTH,    Marc

Op dinsdag 15 oktober 2019 09:38:08 UTC+2 schreef Lilly:
Hi everyone,

I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark://127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh.
I checked that spark works with the provided example programs.

I am further using the janusgraph-0.4.0-hadoop2 binary.

Now I configured the read-cassandra-3.properties as follows:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.master=spark://127.0.0.1:7077
spark.executor.memory=8g
spark.executor.extraClassPath=/home/janusgraph-0.4.0-hadoop2/lib/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

where the janusgraph libraries are stored in /home/janusgraph-0.4.0-hadoop2/lib/*

In my java application I now tried
Graph graph = GraphFactory.open('...')
GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class);
and then g.V().count().next()
I get the error message:
ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job
Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267

Any ideas as to what might be the problem?
Thanks!
Lilly


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/e7336651-4265-4508-985b-64ed53935fff%40googlegroups.com.


Sai Supraj R
 

HI I tried with the above solution but it is still throwing error :

java.lang.Throwable: Hook creation trace
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:185) [load_test.jar:na]
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161) [load_test.jar:na]
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132) [load_test.jar:na]
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:79) [load_test.jar:na]
at com.iqvia.janus.LoadDataTest1.main(LoadDataTest1.java:41) [load_test.jar:na]
Exception in thread "main" java.lang.IllegalArgumentException: Graph does not support the provided graph computer: SparkGraphComputer
at org.apache.tinkerpop.gremlin.structure.Graph$Exceptions.graphDoesNotSupportProvidedGraphComputer(Graph.java:1190)
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.compute(JanusGraphBlueprintsGraph.java:157)
at com.iqvia.janus.LoadDataTest1.main(LoadDataTest1.java:58)

Thanks
Sai


hadoopmarc@...
 

Hi Sai,

This exception is not really related to this thread.

JanusGraph with SparkGraphComputer can only be used with the TinkerPop HadoopGraph. Therefore, the example in the JanusGraph ref docs has a properties file starting with the following lines:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
Some of the other JanusGraph storage backends have their own InputFormat.
If you encounter other problems please include the properties file and calling code.

Best wishes,    Marc