Create one Gremlin query for several request in JanusGraph


Alexander Scherbatiy <stell...@...>
 

Hello,

I am writing a small system using JanusGraph which consists of nodes and links, both of which have types and a node contains some string value and a link consists of some other nodes and links.

I am writing a small system using JanusGraph which consists of nodes and links. A node contains type and value, and a link has type and list of other nodes and links.

The requirement is that a node with the same type and value has been stored only once in JanusGraph.

I have RawNode and RawLink classes with RawAtom as a supertype to represent nodes and links as trees in memory which are not unique
and Link and Node classes with Atom supertype to represent unique nodes and links in Janus Graph.

The ordinary pattern to create a node in JanusGraph checks if there is already a node with the given type and value in JanusGraph, if so then return its id else create new one. The same is for link.

Another option that I want to check would creating and sending one Gremlin request for link retrieving be faster than creating a separate gremlin request for each link children.

Below is the full code. There are some questions that I have:

1) The code to check if node exists and create it otherwise looks like:
------------------------
GraphTraversal<Object, Vertex> addVertex = addV(LABEL_NODE)
.property(T.id, storage.getNextId())
.property(KIND, LABEL_NODE)
.property(TYPE, node.type)
.property(VALUE, node.value);

return V()
.hasLabel(LABEL_NODE)
.has(TYPE, node.type)
.has(VALUE, node.value)
.fold()
.coalesce(unfold(), addVertex);
------------------------

If I call directly 'next()' on 'addVertex' I get "UnsupportedOperationException: Graph does not support adding vertices".

To make it work I use GraphTraversal from graph in the following way:
------------------------
Vertex v = g
.inject("nothing")
.union(getOrCreateNode(node))
.next();
------------------------
Is it possible to use 'addV()' method without injection some object and calling 'union' on g?


2) A link consists of other nodes and links in JanusGraph in a definite order so I have "prop_ids" property that contains concatenated list of children ids.

To get the link children I need to get "prop_ids" property, split it and retrieve children by corresponding ids.
Is this way usually faster rather than storing edges from a parent links to children with label like "CHILD" and property "position" with values from 0 to N-1, retrieving them all and sorting by "position" property?


3) To create a link I need to check if the link exists and if not create it.

To check if the link exists I first check or create its children and then obtains their ids.
To create the link I need to reuse children ids but it is not possible to reuse the GraphTraversal from the previous step second time.
My code looks like:
------------------------
GraphTraversal<Object, Vertex> addVertex = union(getOrCreateAtoms(link))
.id()
.fold()
.as("ids")
.addV(LABEL_LINK)
.property(KIND, LABEL_LINK)
.property(TYPE, link.type)
.property(IDS, select("ids").flatMap(MAP_IDS))
.property(T.id, storage.getNextId());

return union(getOrCreateAtoms(link))
.id()
.fold()
.as("ids")
.V()
.hasLabel(LABEL_LINK)
.has(KIND, LABEL_LINK)
.has(TYPE, link.type)
.has(IDS, select("ids").flatMap(MAP_IDS))
.fold()
.coalesce(unfold(), addVertex);
------------------------

where I first get a traversal for children 'getOrCreateAtoms(link)', use union and map functions to obtain their ids as one string.
I call 'getOrCreateAtoms(link)' twice one when I check that the link already exits in JanusGraph and one to create it.
Is it possible to reuse ids obtained in the 'check' step in 'create' step?

4) My code does not work as I expected when I use a link which consists of other link.

For example the following code works as expected:
-----------------
RawLink rawLink = new RawLink("Link1",
new RawLink("Link2",
new RawNode("Node1", "value1")),
new RawLink("Link3",
new RawNode("Node2", "value2")));

Link link = tx.getLink(rawLink);
System.out.printf("storage link: %s%n", link);
-----------------

The result is 'Link[1792]: Link1([2304, 3072])' - the link with id 1792 and type 'Link1' which consists of two children with ids: [2304, 3072]
I can dump the JanusGraph and get:
-----------------
Node[2560]: Node1(value1)
Link[2304]: Link2([2560])
Node[3328]: Node2(value2)
Link[3072]: Link3([3328])
Link[1792]: Link1([2304, 3072])
-----------------
and indeed ids [2304, 3072] points to 'Link2' and 'Link3'.

Now I change type of the top link to 'Link2' in the same link request:
----------------
// Type "Link1" has been changed to "Link2"
RawLink rawLink = new RawLink("Link2",
new RawLink("Link2",
new RawNode("Node1", "value1")),
new RawLink("Link3",
new RawNode("Node2", "value2")));
----------------

The result is 'Link[2304]: Link2([2560])' - the retrieved link had type 'Link2' as expected but only one child instead of two.
Here is the JanusGraph dump:
----------------
Node[2560]: Node1(value1)
Link[2304]: Link2([2560])
Node[3328]: Node2(value2)
Link[3072]: Link3([3328])
----------------

It shows that link with id 2304 contains node with id 2560 so it is not the top link with two children. The top link with two children was not even created. For some reason the code that checks if the link exists checked only type and not ids.

To debug that I printed the gremlin request but because I use map method to retrieve ids I only see
'prop_ids=[[SelectOneStep(last,ids), LambdaFlatMapStep(lambda)]]' for prop_ids property value.

What is the right way to check which value a lambda sets for a property in Gremlin request?

Thanks,
Alexander.

The full example code:
----------------------------------
public class DataStorageSample {

private static final boolean DEBUG = false;

public static void main(String[] args) throws Exception {

try (JanusGraphStorage storage = getInMemoryStorage();
JanusGraphStorageTransaction tx = storage.tx()) {

// try to change Link1 type to Link2
final RawLink rawLink = new RawLink("Link1",
new RawLink("Link2",
new RawNode("Node1", "value1")),
new RawLink("Link3",
new RawNode("Node2", "value2")));

Link link = tx.getLink(rawLink);
System.out.printf("raw link: %s%n", rawLink);
System.out.printf("storage link: %s%n", link);

tx.dump();
tx.commit();
}
}

private static JanusGraph getInMemoryGraph() {
return JanusGraphFactory.build()
.set("storage.backend", "inmemory")
.set("graph.set-vertex-id", "true")
.open();
}

private static JanusGraphStorage getInMemoryStorage() {
return new JanusGraphStorage(getInMemoryGraph());
}

// JanusGraph Storage

static class JanusGraphStorage implements Closeable {

long currentId = 0;
final JanusGraph graph;
final IDManager idManager;

public JanusGraphStorage(JanusGraph graph) {
this.graph = graph;
this.idManager = ((StandardJanusGraph) graph).getIDManager();
}

public JanusGraphStorageTransaction tx() {
return new JanusGraphStorageTransaction(this);
}

@Override
public void close() {
graph.close();
}

public long getNextId() {
return idManager.toVertexId(++currentId);
}
}

static class JanusGraphStorageTransaction implements Closeable {

// "type" is a reserved property name in JanusGraph
static final String KIND = "prop_kind";
static final String TYPE = "prop_type";
static final String VALUE = "prop_value";
static final String IDS = "prop_ids";

static final String LABEL_NODE = "Node";
static final String LABEL_LINK = "Link";

final JanusGraphStorage storage;
final JanusGraphTransaction tx;
final GraphTraversalSource g;

public JanusGraphStorageTransaction(JanusGraphStorage storage) {
this.storage = storage;
this.tx = storage.graph.newTransaction();
this.g = tx.traversal();
}

public Node getNode(RawNode node) {
GraphTraversal<String, Vertex> traversal = g
.inject("nothing")
.union(getOrCreateNode(node));

if (DEBUG) {
System.out.printf("get node: %s%n", traversal);
}

Vertex v = traversal.next();
return new Node(id(v), node.type, node.value);
}

public Link getLink(RawLink link) {

GraphTraversal<String, Vertex> traversal = g
.inject("nothing")
.union(getOrCreateLink(link));

if (DEBUG) {
System.out.printf("get link: %s%n", traversal);
}

Vertex v = traversal.next();
return new Link(id(v), link.type, ids(v));
}

public void commit() {
tx.commit();
}

@Override
public void close() {
tx.close();
}

private GraphTraversal<Object, Vertex> getOrCreateAtom(RawAtom atom) {
if (atom instanceof RawNode) {
return getOrCreateNode((RawNode) atom);
} else if (atom instanceof RawLink) {
return getOrCreateLink((RawLink) atom);
} else {
String msg = String.format("Unknown RawAtom class: %s", atom.getClass());
throw new RuntimeException(msg);
}
}

private GraphTraversal<Object, Vertex> getOrCreateNode(RawNode node) {

GraphTraversal<Object, Vertex> addVertex = addV(LABEL_NODE)
.property(T.id, storage.getNextId())
.property(KIND, LABEL_NODE)
.property(TYPE, node.type)
.property(VALUE, node.value);

return V()
.hasLabel(LABEL_NODE)
.has(TYPE, node.type)
.has(VALUE, node.value)
.fold()
.coalesce(unfold(), addVertex);
}

private GraphTraversal<Object, Vertex> getOrCreateLink(RawLink link) {

GraphTraversal<Object, Vertex> addVertex = union(getOrCreateAtoms(link))
.id()
.fold()
.as("ids")
.addV(LABEL_LINK)
.property(KIND, LABEL_LINK)
.property(TYPE, link.type)
.property(IDS, select("ids").flatMap(MAP_IDS))
.property(T.id, storage.getNextId());

return union(getOrCreateAtoms(link))
.id()
.fold()
.as("ids")
.V()
.hasLabel(LABEL_LINK)
.has(KIND, LABEL_LINK)
.has(TYPE, link.type)
.has(IDS, select("ids").flatMap(MAP_IDS))
.fold()
.coalesce(unfold(), addVertex);
}

private GraphTraversal<Object, Vertex>[] getOrCreateAtoms(RawLink link) {
int arity = link.getArity();
GraphTraversal<Object, Vertex>[] addAtoms = new GraphTraversal[arity];

for (int i = 0; i < arity; i++) {
addAtoms[i] = getOrCreateAtom(link.atoms[i]);
}
return addAtoms;
}

public void dump() {
System.out.printf("--- Storage Dump ---%n");
Iterator<Vertex> vertices = g.V();
while (vertices.hasNext()) {
Vertex v = vertices.next();
String kind = v.property(KIND).value().toString();
String type = v.property(TYPE).value().toString();
Object id = v.id();
if (LABEL_NODE.equals(kind)) {
String value = v.property(VALUE).value().toString();
System.out.printf("%s[%s]: %s(%s)%n", kind, id, type, value);
} else {
System.out.printf("%s[%s]: %s(%s)%n", kind, id, type, Arrays.toString(ids(v)));
}
}
System.out.printf("--- ------------ ---%n");
}

static long id(Vertex v) {
return (long) v.id();
}

static long[] ids(Vertex v) {
String ids = v.property(IDS).value().toString();
return toIds(ids);
}
}

static final Function<Traverser<Object>, Iterator<String>> MAP_IDS = t -> {
ArrayList arrayList = (ArrayList) t.get();
long[] ids = new long[arrayList.size()];
for (int i = 0; i < arrayList.size(); i++) {
ids[i] = (long) arrayList.get(i);
}
List<String> list = new ArrayList<>(1);
list.add(idsToString(ids));
return list.iterator();
};

static String idsToString(long... ids) {
StringBuilder builder = new StringBuilder();
for (long id : ids) {
builder.append(id).append(':');
}
return builder.toString();
}

static long[] toIds(String str) {
String[] split = str.split(":");

long[] ids = new long[split.length];

for (int i = 0; i < split.length; i++) {
ids[i] = Long.parseLong(split[i]);
}
return ids;
}

// Raw Atoms
static class RawAtom {

final String type;

public RawAtom(String type) {
this.type = type;
}
}

static class RawNode extends RawAtom {

final String value;

public RawNode(String type, String value) {
super(type);
this.value = value;
}

@Override
public String toString() {
return String.format("%s(%s)", type, value);
}
}

static class RawLink extends RawAtom {

final RawAtom[] atoms;

public RawLink(String type, RawAtom... atoms) {
super(type);
this.atoms = atoms;
}

public int getArity() {
return atoms.length;
}

@Override
public String toString() {
return String.format("%s(%s)", type, Arrays.toString(atoms));
}
}

// Atoms in JanusGraph Storage

static class Atom {
final long id;
final String type;

public Atom(long id, String type) {
this.id = id;
this.type = type;
}
}

static class Node extends Atom {
final String value;

public Node(long id, String type, String value) {
super(id, type);
this.value = value;
}

@Override
public String toString() {
return String.format("%s(%s) - Node[%d]", type, value, id);
}
}

static class Link extends Atom {

final long[] ids;

public Link(long id, String type, long... ids) {
super(id, type);
this.ids = ids;
}

@Override
public String toString() {
return String.format("Link[%d]: %s(%s)", id, type, Arrays.toString(ids));
}
}
}
----------------------------------

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.