Loading Janus DB from a distributed program


robk...@...
 

I'm trying to write a Janus DB loader for my distributed application.  

I have "sub-graphs" which I can successfully load independently, per machine in my cluster.  I do this by using graph.addVertex and subsequently a loop of vertex.addEdge(otherVertex), for each vertex in my sub-graph.

However, I have a second kind of vertex, whose edges can refer to any other vertex on any other machine.  These effectively join my sub-graphs together.

JanusGraphVertices are not serializable, so i cant transport the vertices across machines.  Also I can't seem to find a nice way to fetch these vertices from Janus, from the already loaded sub-graphs.  Is there any way to do this nicely?

I know I could probably write the entire graph out to HDFS and load to Janus with a bulk loader program.  However my application doesn't currently require Hadoop and i'd like to avoid this if possible!

Thanks in advance,
Rob


HadoopMarc <m.c.d...@...>
 

Hi Rob,

JanusGraph provides indices on properties. So you should be sure that you have an index declared on the property by which you want to identify your vertices. In that case, you can just retrieve the unknown vertex from the JanusGraph backend using an indexed query. This is not as fast as caching in JanuGraph itself, but I also do not see a way to realize that.

Cheers,    Marc

Op maandag 6 maart 2017 17:03:24 UTC+1 schreef Rob Keevil:

I'm trying to write a Janus DB loader for my distributed application.  

I have "sub-graphs" which I can successfully load independently, per machine in my cluster.  I do this by using graph.addVertex and subsequently a loop of vertex.addEdge(otherVertex), for each vertex in my sub-graph.

However, I have a second kind of vertex, whose edges can refer to any other vertex on any other machine.  These effectively join my sub-graphs together.

JanusGraphVertices are not serializable, so i cant transport the vertices across machines.  Also I can't seem to find a nice way to fetch these vertices from Janus, from the already loaded sub-graphs.  Is there any way to do this nicely?

I know I could probably write the entire graph out to HDFS and load to Janus with a bulk loader program.  However my application doesn't currently require Hadoop and i'd like to avoid this if possible!

Thanks in advance,
Rob


robk...@...
 

Thanks, this worked perfectly.  The JanusGraphTransaction object also has a getVertices method which allows me to fetch them by ID.  Performance seems pretty good on the small set of data I have, ill retest shortly with a larger set.