Re: Janusgraph - OLAP using Dataproc


SAURABH VERMA <saurabh...@...>
 

We've set up and janusgraph OLAP with spark-yarn, is that something you are looking for?

Thanks

On Thu, Jun 18, 2020 at 10:39 PM <bobo...@...> wrote:
Hi,

We are using Janusgraph (0.5.2) with Scylladb as backend. So far we are only using OLTP capabilities but would now like to also do some more advanced batch processing to create shortcut edges, for example for recommendations. To do that, I would like to use the OLAP features.

Reading the documentation this sounds pretty straightforward, assuming one has a Hadoop cluster up and running. But here comes my problem: I would like to use Dataproc - Google's managed solution for Hadoop and Spark. Unfortunately I couldn't find any further information on how to get those two things playing well together.

Does anyone have any experience, hints or documentation on how to properly configure Janusgraph with Dataproc?

In a very first step, a was trying the following (Java application with embedded Janusgraph)

GraphTraversalSource g = GraphFactory.open("graph.properties").traversal().withComputer(SparkGraphComputer.class);
long count = g.V().count().next();
...
g
.close()

the graph.properties file looking like this

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin
.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output
gremlin
.spark.persistContext=true

# Cassandra
janusgraphmr
.ioformat.conf.storage.backend=cql
janusgraphmr
.ioformat.conf.storage.hostname=myhost
janusgraphmr
.ioformat.conf.storage.port=9042
janusgraphmr
.ioformat.conf.index.search.backend=lucene
janusgraphmr
.ioformat.conf.index.search.directory=/tmp/
janusgraphmr
.ioformat.conf.index.search.hostname=127.0.0.1
cassandra
.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra
.input.widerows=true
# Spark
spark
.master=local[*]
spark
.executor.memory=1g
spark
.serializer=org.apache.spark.serializer.KryoSerializer
spark
.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


If I just run the code like this, without specifying anything else, it just results in nothing happening, and endless log output like these
Code hier eingeben...18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Additionally, I added the hdfs-site extracted from dataproc to my classpath, but that didn't help any.

The same in the OLTP world works like a charm. (of course using a proper query, one not iterating over the whole graph .... :D )

Any hints, ideas, experiences or links are greatly appreciated.

Looking forward to some answers,
Claire

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/7dc9a3f1-82bc-47d5-89a1-5f3d4e21e5cdo%40googlegroups.com.


--
Thanks & Regards,
Saurabh Verma,
India


Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.