[DISCUSS] Automatic reversal of traversal to exploit index
conf=new BaseConfiguration()
graph = JanusGraphFactory.open('c:/dev/janus/janusgraph-0.1.1-hadoop2/conf/janus-berkeley-gw.properties')
mgmt=graph.openManagement()
assetid = mgmt.makePropertyKey('assetid').dataType(String.class).make()
asset=mgmt.makeVertexLabel('Asset').make()
mgmt.buildIndex('assetById',Vertex.class).addKey(assetid).indexOnly(asset).buildCompositeIndex()
mgmt.commit()
for (i in 1..100000) {
p=graph.addVertex(label,'Person','name',i)
a=graph.addVertex(label,'Asset','assetid','asset-num-'+i)
p.addEdge('owns',a)
}
graph.tx().commit()
// 'Forward' query - starts with the non-indexed vertex label and traverses 'out' to connected vertex with indexed attribute
t=System.currentTimeMillis();g.V().hasLabel('Person').out('owns').hasLabel('Asset').has('assetid','asset-num-7').iterate();System.currentTimeMillis()-t
// Issues the usual warning: 14:18:26 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(~label = Person)]. For better perform
// 2773ms
// 'Reversed' query - starts with the vertex with indexed attribute and traverses 'in' to connected non-indexed vertex
t=System.currentTimeMillis();g.V().hasLabel('Asset').has('assetid','asset-num-7').in('owns').hasLabel('Person').iterate();System.currentTimeMillis()-t
// 2ms
Note that the reversed traversal is many times faster - approx 1000 faster in this particular example.
Given that JanusGraph knows what has been indexed, could JanusGraph determine *automatically* that the traversal would be more efficient starting at the indexed vertex and run backwards? If so, could this be extended to auto-reversal of segments of longer traversals?
Many thanks
Graham
On Tuesday, July 18, 2017 at 9:58:51 AM UTC-5, graham...@... wrote:
If a graph has an index on a particular attribute of a vertex with a particular label, and a user issues a gremlin query that attempts to traverse from a non-indexed vertex to the indexed vertex, then can the traversal (or at least this part of it) be reversed to exploit the availability of the index?Here's an example:conf=new BaseConfiguration()
graph = JanusGraphFactory.open('c:/
dev/janus/janusgraph-0.1.1- hadoop2/conf/janus-berkeley- gw.properties')
mgmt=graph.openManagement()
assetid = mgmt.makePropertyKey('assetid'
).dataType(String.class).make( ) asset=mgmt.makeVertexLabel('
Asset').make() mgmt.buildIndex('assetById',
Vertex.class).addKey(assetid). indexOnly(asset). buildCompositeIndex() mgmt.commit()
for (i in 1..100000) {
p=graph.addVertex(label,'
Person','name',i) a=graph.addVertex(label,'
Asset','assetid','asset-num-'+ i) p.addEdge('owns',a)
}
graph.tx().commit()
// 'Forward' query - starts with the non-indexed vertex label and traverses 'out' to connected vertex with indexed attribute
t=System.currentTimeMillis();
g.V().hasLabel('Person').out(' owns').hasLabel('Asset').has(' assetid','asset-num-7'). iterate();System. currentTimeMillis()-t // Issues the usual warning: 14:18:26 WARN org.janusgraph.graphdb.
transaction. StandardJanusGraphTx - Query requires iterating over all vertices [(~label = Person)]. For better perform // 2773ms
// 'Reversed' query - starts with the vertex with indexed attribute and traverses 'in' to connected non-indexed vertex
t=System.currentTimeMillis();
g.V().hasLabel('Asset').has(' assetid','asset-num-7').in(' owns').hasLabel('Person'). iterate();System. currentTimeMillis()-t // 2ms
Note that the reversed traversal is many times faster - approx 1000 faster in this particular example.
Given that JanusGraph knows what has been indexed, could JanusGraph determine *automatically* that the traversal would be more efficient starting at the indexed vertex and run backwards? If so, could this be extended to auto-reversal of segments of longer traversals?
Many thanks
Graham
Thanks Ted, That's really helpful. I guess ideally this should be implemented by an OptimizationStrategy, so that it would be portable across graph providers. I guess that requires that we can determine what indexes exist, etc, in a portable manner. If that's not possible - e.g. because we need to use Janus's mgmt system - then the alternative appears to be a ProviderOptimizationStrategy that would be specific to JanusGraph. I will try to take a look at it.