OLAP, Hadoop, Spark and Cassandra


sly...@...
 

Hi,

I'm trying to get JanusGraph 0.4.0 with a Cassandra (CQL) backend setup and running as OLAP while still keeping OLTP active in order to do graph updates. I've been searching high and low for some guidance, but so far without any luck. Hopefully someone here could tune in and help?

Here's where I'm at currently

  • local Hadoop running according to https://old-docs.janusgraph.org/0.4.0/hadoop-tp3.html
  • gremlin server started as /bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
  • gremlin-server-configuration.yaml points to init.groovy script doing the traversal mappings for OLTP and OLAP
def globals = [:]
ve = ConfiguredGraphFactory.open("ve_graph")
OLAPGraph = GraphFactory.open('conf/hadoop-graph/read-cql.properties')
globals << [g : ve.traversal(), sg: OLAPGraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)]
  • conf/hadoop-graph/read-cql.properties reads
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
  • Running the gremlin shell I have
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[655848fc-b46e-40be-8174-f0dc42cdabd4]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[655848fc-b46e-40be-8174-f0dc42cdabd4] - type ':remote console' to return to local mode
gremlin> g
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin>
gremlin> sg
==>graphtraversalsource[hadoopgraph[cqlinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().has('lbl','System').count()
==>68
gremlin> sg.V().has('lbl','System').count()
  • The job is running for some time and while finishing the gremlin-server.log reads
253856 [Executor task launch worker for task 768] INFO  org.apache.spark.executor.Executor  - Finished task 768.0 in stage 0.0 (TID 768). 2388 bytes result sent to driver
253858 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSetManager  - Finished task 768.0 in stage 0.0 (TID 768) in 6809 ms on localhost (executor driver) (769/769)
253861 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - ResultStage 0 (fold at SparkStarBarrierInterceptor.java:101) finished in 161.427 s
253861 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSchedulerImpl  - Removed TaskSet 0.0, whose tasks have all completed, from pool
253876 [SparkGraphComputer-boss] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 0 finished: fold at SparkStarBarrierInterceptor.java:101, took 161.598267 s
253888 [SparkGraphComputer-boss] INFO  org.apache.spark.rdd.MapPartitionsRDD  - Removing RDD 1 from persistence list
253901 [block-manager-slave-async-thread-pool-0] INFO  org.apache.spark.storage.BlockManager  - Removing RDD 1
  • However - the count (==> ) reads 0 for the sg traversal
I've most likely missed some crucial point here, but I'm not able to spot it. Please help.


Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.