Erratic behavior with Cassandra input and SparkGraphComputer OLAP engine


Samik R <sam...@...>
 

Hi,

I am testing out SparkGraphComputer for OLAP queries, directly reading data from a JG-Cassandra-ES instance. Everything is running on a single VM, and I have built JG on the box but cloning the repo. Using hadoop version 2.7.1 with Spark 1.6.1. Cassandra version 2.1.9 (same as packaged).

I am using the properties file mentioned in this SO thread - mostly because the setup matches with mine. I initially tried out with a smaller graph having ~1K nodes and 1.5K edges, and things seem to work fine. However when I try OLAP queries with ~300K nodes, I am facing various issues.

  • Initially, I got hit by the Exception: "java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (20784689) larger than max length (15728640)!". After some reasearch, I added the following line to the properties file: cassandra.thrift.framed.size_mb=200
  • In the next try, the Cassandra process died when I tried running the query. The gremlin server and ES processes were running though.

gremlin> graph = GraphFactory.open("conf/hadoop-graph/read-cassandra.properties")
==>hadoopgraph[cassandrainputformat->gryooutputformat]
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
[Stage 0:===21:16:23 ERROR org.apache.spark.executor.Executor  - Exception in task 4.0 in stage 0.0 (TID 4)
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
...

org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
...

Caused by: org.janusgraph.diskstorage.PermanentBackendException: Permanent failure in storage backend
...

Caused by: org.apache.thrift.transport.TTransportException

...


  • I restarted janusgraph and retried the same query. This time the query went through, but the same exception reappeared when I tried a groupCount.

gremlin> g.V().count()
                                                                        ==>108156
gremlin> g.V().groupCount().by(T.label)
[Stage 0:>                           21:23:49 ERROR org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException


  • Another restart, and the groupCount() query went through, but the gremlin shell got killed when I tried the count query. All three daemons (gremlin, Cassandra and ES) were still running though.

gremlin> g.V().groupCount().by(T.label)
[Stage 2:>                                                          (0 +==>[hotLead:1,proactiveChatInvite:1,chatSession:906,webPage:56921,buttonChatInvite:1,webPurchase:1,visitor:1269,webSession:27378,device:21677,cart:1]
gremlin> g.V().count()
[Stage 0:>                           Killed                                    
samik@samik-lap:~/git/janusgraph$ Write failed: Broken pipe


These all seems pretty erratic to me. Any suggestions on getting consistent result with this?


Regards.

-Samik

Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.