[DISCUSS] Automatic reversal of traversal to exploit index


graham...@...
 

If a graph has an index on a particular attribute of a vertex with a particular label, and a user issues a gremlin query that attempts to traverse from a non-indexed vertex to the indexed vertex, then can the traversal (or at least this part of it) be reversed to exploit the availability of the index?

Here's an example:

conf=new BaseConfiguration()

graph = JanusGraphFactory.open('c:/dev/janus/janusgraph-0.1.1-hadoop2/conf/janus-berkeley-gw.properties')


mgmt=graph.openManagement()

assetid = mgmt.makePropertyKey('assetid').dataType(String.class).make()

asset=mgmt.makeVertexLabel('Asset').make()

mgmt.buildIndex('assetById',Vertex.class).addKey(assetid).indexOnly(asset).buildCompositeIndex()

mgmt.commit()



for (i in 1..100000) {

  p=graph.addVertex(label,'Person','name',i)

  a=graph.addVertex(label,'Asset','assetid','asset-num-'+i)

  p.addEdge('owns',a)

}

graph.tx().commit()


// 'Forward' query - starts with the non-indexed vertex label and traverses 'out' to connected vertex with indexed attribute

t=System.currentTimeMillis();g.V().hasLabel('Person').out('owns').hasLabel('Asset').has('assetid','asset-num-7').iterate();System.currentTimeMillis()-t

// Issues the usual warning: 14:18:26 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(~label = Person)]. For better perform

// 2773ms


// 'Reversed' query - starts with the vertex with indexed attribute and traverses 'in' to connected non-indexed vertex

t=System.currentTimeMillis();g.V().hasLabel('Asset').has('assetid','asset-num-7').in('owns').hasLabel('Person').iterate();System.currentTimeMillis()-t

// 2ms


Note that the reversed traversal is many times faster - approx 1000 faster in this particular example.


Given that JanusGraph knows what has been indexed, could JanusGraph determine *automatically* that the traversal would be more efficient starting at the indexed vertex and run backwards? If so, could this be extended to auto-reversal of segments of longer traversals?


Many thanks

  Graham



Ted Wilmes <twi...@...>
 

Hi Graham,
That's a great question. JanusGraph could perform this optimization, but as you're seeing, it doesn't now. This functionality would involve developing a new traversal strategy [1] that would take Janus schema into account, identify patterns where this reversal could be applied, and the traversal would be transparently rewritten to produce a traversal that provides an equivalent (albeit faster) result.

--Ted


On Tuesday, July 18, 2017 at 9:58:51 AM UTC-5, graham...@... wrote:
If a graph has an index on a particular attribute of a vertex with a particular label, and a user issues a gremlin query that attempts to traverse from a non-indexed vertex to the indexed vertex, then can the traversal (or at least this part of it) be reversed to exploit the availability of the index?

Here's an example:

conf=new BaseConfiguration()

graph = JanusGraphFactory.open('c:/dev/janus/janusgraph-0.1.1-hadoop2/conf/janus-berkeley-gw.properties')


mgmt=graph.openManagement()

assetid = mgmt.makePropertyKey('assetid').dataType(String.class).make()

asset=mgmt.makeVertexLabel('Asset').make()

mgmt.buildIndex('assetById',Vertex.class).addKey(assetid).indexOnly(asset).buildCompositeIndex()

mgmt.commit()



for (i in 1..100000) {

  p=graph.addVertex(label,'Person','name',i)

  a=graph.addVertex(label,'Asset','assetid','asset-num-'+i)

  p.addEdge('owns',a)

}

graph.tx().commit()


// 'Forward' query - starts with the non-indexed vertex label and traverses 'out' to connected vertex with indexed attribute

t=System.currentTimeMillis();g.V().hasLabel('Person').out('owns').hasLabel('Asset').has('assetid','asset-num-7').iterate();System.currentTimeMillis()-t

// Issues the usual warning: 14:18:26 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(~label = Person)]. For better perform

// 2773ms


// 'Reversed' query - starts with the vertex with indexed attribute and traverses 'in' to connected non-indexed vertex

t=System.currentTimeMillis();g.V().hasLabel('Asset').has('assetid','asset-num-7').in('owns').hasLabel('Person').iterate();System.currentTimeMillis()-t

// 2ms


Note that the reversed traversal is many times faster - approx 1000 faster in this particular example.


Given that JanusGraph knows what has been indexed, could JanusGraph determine *automatically* that the traversal would be more efficient starting at the indexed vertex and run backwards? If so, could this be extended to auto-reversal of segments of longer traversals?


Many thanks

  Graham



graham...@...
 

Thanks Ted, That's really helpful. I guess ideally this should be implemented by an OptimizationStrategy, so that it would be portable across graph providers. I guess that requires that we can determine what indexes exist, etc, in a portable manner. If that's not possible - e.g. because we need to use Janus's mgmt system - then the alternative appears to be a ProviderOptimizationStrategy that would be specific to JanusGraph. I will try to take a look at it.