Not able to run queries using spark graph computer from java


Sai Supraj R
 

Hi,

I am getting the following error when running queries using spark graph computer from java.
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Edge with id already exists: 1469152598528
at org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:182)
at org.janusgraph.hadoop.formats.util.HadoopRecordReader.nextKeyValue(HadoopRecordReader.java:69)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
... 3 more

code:
Graph graph = JanusGraphFactory.open("read-cql.properties");
GraphTraversalSource g = createTraversal();
x = g.V().count()

read-cql.properties:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat


gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true

janusgraphmr.ioformat.conf.storage.backend=cql
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=10.88.68.52,10.88.68.11,10.88.68.47
janusgraphmr.ioformat.conf.storage.port=9042
# This specifies the keyspace where data is stored.
janusgraphmr.ioformat.conf.storage.cql.keyspace=iqvia

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Thanks
Sai



hadoopmarc@...
 

Hi Sai,

The calling code you present is not complete.

The first line should read (because HadoopGraph does not derive from JanusGraph):
Graph graph = GraphFactory.open("read-cql.properties");
Best wishes,    Marc



Sai Supraj R
 

Hi Marc,

Sorry my bad I have posted the wrong code.

I used Graph graph = GraphFactory.open("read-cql.properties");

and i got the above error.

Thanks
Sai


On Thu, May 6, 2021 at 10:11 AM <hadoopmarc@...> wrote:
Hi Sai,

The calling code you present is not complete.

The first line should read (because HadoopGraph does not derive from JanusGraph):
Graph graph = GraphFactory.open("read-cql.properties");
Best wishes,    Marc



hadoopmarc@...
 

Hi Sai,

What happens in createTraversal()?

What do you get with g.V(1469152598528).elementMap() if you open the graph for OLTP queries?

Best wishes,   Marc


Sai Supraj R
 

Hi Marc,
I got this when querying using OLTP:
gremlin> g.V(1469152598528)
==>v[1469152598528]
gremlin> g.V(1469152598528).elementMap()
==>[id:1469152598528,label:vertex]

I am also trying to run spark graph computer with yarn on emr.

Spark version = 2.4.4
Scala version = 2.12.10
java.io.FileNotFoundException: File file:/home/hadoop/.sparkStaging/application_1618505307369/__spark_libs__910446852825.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:671)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:992)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:661)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:464)
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:243)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:224)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


I followed this blog but ended up with the above exception:

Thanks
Sai



On Fri, May 7, 2021 at 7:33 AM <hadoopmarc@...> wrote:
Hi Sai,

What happens in createTraversal()?

What do you get with g.V(1469152598528).elementMap() if you open the graph for OLTP queries?

Best wishes,   Marc


hadoopmarc@...
 

Hi Sai,

The blog you mentioned is a bit outdated and  is for spark-1.x. To get an idea of what changes are needed to get OLAP running with spark-2.x, you can take a look at:
https://tinkerpop.apache.org/docs/current/recipes/#olap-spark-yarn

Best wishes,    Marc