[DISCUSS] Automatic reversal of traversal to exploit index
graham...@...
Thanks Ted, That's really helpful. I guess ideally this should be implemented by an OptimizationStrategy, so that it would be portable across graph providers. I guess that requires that we can determine what indexes exist, etc, in a portable manner. If that's not possible - e.g. because we need to use Janus's mgmt system - then the alternative appears to be a ProviderOptimizationStrategy that would be specific to JanusGraph. I will try to take a look at it. |
|
Ted Wilmes <twi...@...>
Hi Graham, That's a great question. JanusGraph could perform this optimization, but as you're seeing, it doesn't now. This functionality would involve developing a new traversal strategy [1] that would take Janus schema into account, identify patterns where this reversal could be applied, and the traversal would be transparently rewritten to produce a traversal that provides an equivalent (albeit faster) result. --Ted On Tuesday, July 18, 2017 at 9:58:51 AM UTC-5, graham...@... wrote:
|
|
graham...@...
If a graph has an index on a particular attribute of a vertex with a particular label, and a user issues a gremlin query that attempts to traverse from a non-indexed vertex to the indexed vertex, then can the traversal (or at least this part of it) be reversed to exploit the availability of the index? Here's an example: conf=new BaseConfiguration() graph = JanusGraphFactory.open('c:/dev/janus/janusgraph-0.1.1-hadoop2/conf/janus-berkeley-gw.properties')
mgmt=graph.openManagement() assetid = mgmt.makePropertyKey('assetid').dataType(String.class).make() asset=mgmt.makeVertexLabel('Asset').make() mgmt.buildIndex('assetById',Vertex.class).addKey(assetid).indexOnly(asset).buildCompositeIndex() mgmt.commit()
for (i in 1..100000) { p=graph.addVertex(label,'Person','name',i) a=graph.addVertex(label,'Asset','assetid','asset-num-'+i) p.addEdge('owns',a) } graph.tx().commit() // 'Forward' query - starts with the non-indexed vertex label and traverses 'out' to connected vertex with indexed attribute t=System.currentTimeMillis();g.V().hasLabel('Person').out('owns').hasLabel('Asset').has('assetid','asset-num-7').iterate();System.currentTimeMillis()-t // Issues the usual warning: 14:18:26 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(~label = Person)]. For better perform // 2773ms // 'Reversed' query - starts with the vertex with indexed attribute and traverses 'in' to connected non-indexed vertex t=System.currentTimeMillis();g.V().hasLabel('Asset').has('assetid','asset-num-7').in('owns').hasLabel('Person').iterate();System.currentTimeMillis()-t // 2ms Note that the reversed traversal is many times faster - approx 1000 faster in this particular example. Given that JanusGraph knows what has been indexed, could JanusGraph determine *automatically* that the traversal would be more efficient starting at the indexed vertex and run backwards? If so, could this be extended to auto-reversal of segments of longer traversals? Many thanks Graham |
|