Forcing Janusgraph to use indices when performing traversal with Union step


brad@...
 

Hello,

I need to execute Janusgraph traversals, using a Union step, which contains multiple 'has' steps, each of which tests an indexed property for equality to a value.  Prior to the Union step, there is another 'has' step, which also tests an indexec property.

For example:

    g.V().has('indexed-prop1', 'value1').union(has('indexed-prop2', 'value2'), has('indexed-prop3', 'value3'))

How can I verify whether or not the traversal is using indices 'indexed-prop2' and 'indexed-prop3'?  I suspect that it is not.  The traversal seems to take a long time to run; however, when I simplify the traversal, by having only one 'has' step within the union, it runs very quickly.

What is the best way to execute this type of traversal.  My application is written in Java.

Thanks in advance,
Brad


Boxuan Li
 

Hi Brad,

Can you post the profile result of both queries? You can retrieve profile results by adding `.profile()` to the end of your query.

Best,
Boxuan


brad@...
 

TRAVERSAL:
[GraphStep(vertex,[]), HasStep([~label.eq(ten1.apm.version), ten1.apm.idx.type_id.eq(i_javaServiceInstance)]), UnionStep([[HasStep([ten1.apm.idx.display_name.eq(JavaServiceInstance3)]), EndStep], [HasStep([ten1.apm.idx.str4.eq(998)]), EndStep]]), RangeGlobalStep(0,2147483647), HasStep([ten1.apm.idx.discovery_ts.lte(1643313164000), ten1.apm.idx.last_seen_ts.gt(0)]), OrderGlobalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), RangeGlobalStep(0,1000), GroupStep(value(uid),[FoldStep, OrderLocalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), UnfoldStep, RangeGlobalStep(0,1)]), LambdaFlatMapStep(lambda), RangeGlobalStep(0,2500), PropertyMapStep([uid, ten1.apm.idx.display_name, ten1.apm.idx.last_seen_ts, ten1.apm.idx.discovery_ts, typ],value)]
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(ten1.apm.version),...                     3           3          14.381    54.44
    \\_condition=(~label = ten1.apm.version AND ten1.apm.idx.type_id = i_javaServiceInstance)
    \\_orders=[]
    \\_isFitted=true
    \\_isOrdered=true
    \\_query=[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_index=index_ten1_apm_0
    \\_index_impl=search
  optimization                                                                                 0.011
  optimization                                                                                 0.381
  backend-query                                                        3                      21.484
    \\_query=index_ten1_apm_0:[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_limit=4000
UnionStep([[HasStep([ten1.apm.idx.display_name....                     2           2          10.005    37.87
  HasStep([ten1.apm.idx.display_name.eq(JavaSer...                     1           1           0.113
  EndStep                                                              1           1           0.076
  HasStep([ten1.apm.idx.str4.eq(998)])                                 1           1           0.208
  EndStep                                                              1           1           0.169
RangeGlobalStep(0,2147483647)                                          2           2           0.177     0.67
HasStep([ten1.apm.idx.discovery_ts.lte(16433131...                     2           2           0.616     2.33
OrderGlobalStep([[value(ten1.apm.idx.last_seen_...                     2           2           0.281     1.06
RangeGlobalStep(0,1000)                                                2           2           0.115     0.44
GroupStep(value(uid),[FoldStep, OrderLocalStep(...                     1           1           0.323     1.22
LambdaFlatMapStep(lambda)                                              2           2           0.082     0.31
RangeGlobalStep(0,2500)                                                2           2           0.062     0.23
PropertyMapStep([uid, ten1.apm.idx.display_name...                     2           2           0.373     1.41
                                            >TOTAL                     -           -          26.417        -
 


Boxuan Li
 

Hi Brad,

I can see that your traversal is using the index index_ten1_apm_0 from the following snippet:

  backend-query                                                        3                      21.484
    \\_query=index_ten1_apm_0:[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_limit=4000

Then, JanusGraph uses in-memory filtering to check whether the results returned by the index_ten1_apm_0 satisfy your predicates has('indexed-prop2', 'value2’) or has('indexed-prop3', ‘value3’).

Do you have an index that contains both the field “indexed-prop1” and “indexed-prop2”, and an index that contains both the field “indexed-prop1” and “indexed-prop3”? If so, try replacing “union” step with “or” step. If not, then you could try creating those indexes, otherwise there is no optimization that JanusGraph can do.

Btw it looks like you are using a JanusGraph version < 0.6, which is out of maintenance. The same query should run moderately faster in the latest version due to a couple of optimizations.

Best,
Boxuan

On Feb 6, 2022, at 3:14 PM, brad@... wrote:

TRAVERSAL:
[GraphStep(vertex,[]), HasStep([~label.eq(ten1.apm.version), ten1.apm.idx.type_id.eq(i_javaServiceInstance)]), UnionStep([[HasStep([ten1.apm.idx.display_name.eq(JavaServiceInstance3)]), EndStep], [HasStep([ten1.apm.idx.str4.eq(998)]), EndStep]]), RangeGlobalStep(0,2147483647), HasStep([ten1.apm.idx.discovery_ts.lte(1643313164000), ten1.apm.idx.last_seen_ts.gt(0)]), OrderGlobalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), RangeGlobalStep(0,1000), GroupStep(value(uid),[FoldStep, OrderLocalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), UnfoldStep, RangeGlobalStep(0,1)]), LambdaFlatMapStep(lambda), RangeGlobalStep(0,2500), PropertyMapStep([uid, ten1.apm.idx.display_name, ten1.apm.idx.last_seen_ts, ten1.apm.idx.discovery_ts, typ],value)]
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(ten1.apm.version),...                     3           3          14.381    54.44
    \\_condition=(~label = ten1.apm.version AND ten1.apm.idx.type_id = i_javaServiceInstance)
    \\_orders=[]
    \\_isFitted=true
    \\_isOrdered=true
    \\_query=[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_index=index_ten1_apm_0
    \\_index_impl=search
  optimization                                                                                 0.011
  optimization                                                                                 0.381
  backend-query                                                        3                      21.484
    \\_query=index_ten1_apm_0:[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_limit=4000
UnionStep([[HasStep([ten1.apm.idx.display_name....                     2           2          10.005    37.87
  HasStep([ten1.apm.idx.display_name.eq(JavaSer...                     1           1           0.113
  EndStep                                                              1           1           0.076
  HasStep([ten1.apm.idx.str4.eq(998)])                                 1           1           0.208
  EndStep                                                              1           1           0.169
RangeGlobalStep(0,2147483647)                                          2           2           0.177     0.67
HasStep([ten1.apm.idx.discovery_ts.lte(16433131...                     2           2           0.616     2.33
OrderGlobalStep([[value(ten1.apm.idx.last_seen_...                     2           2           0.281     1.06
RangeGlobalStep(0,1000)                                                2           2           0.115     0.44
GroupStep(value(uid),[FoldStep, OrderLocalStep(...                     1           1           0.323     1.22
LambdaFlatMapStep(lambda)                                              2           2           0.082     0.31
RangeGlobalStep(0,2500)                                                2           2           0.062     0.23
PropertyMapStep([uid, ten1.apm.idx.display_name...                     2           2           0.373     1.41
                                            >TOTAL                     -           -          26.417        -
 


brad@...
 

Thank you for your reply.

We can look into migrating to version 0.6 (or later)...

There exists an index for indexed-prop1, indexed-prop2, and indexed-prop3, but there are no multi-column indices (e.g., indexed-prop1 and indexed-prop2).  In our application, the columns that might be used for the search are specified by the user, and we wouldn't have any way of knowing in advance what combination of columns they might ask for.  The only thing that we can do now is to ensure that all of the columns that are used in a 'has' step are indexed.  

One approach that I thought might be an improvement is to perform the first traversal (i.e., " g.V().has('indexed-prop1', 'value1')") separately, collect the set of vertices which satisfy this query, then perform an additional query for each 'has' step that is currently in the 'union' step.  So, for example, we run g.V().has('indexed-prop1', 'value1') initially, and it returns 3 vertices (V1, V2, and V3).  We then run a traversal for indexed-prop2:   "g.V(V1, V2, V3).has('indexed-prop2', 'value2')", which returns a subset of the vertices returned by the first traversal.  Then, run another traversal, this time:   "g.V(V1, V2, V3).has('indexed-prop3', 'value3')", again returning a subset of the vertices.  Finally, figure out (programatically, without executing another traversal) the union of the vertices returned by the last 2 traversals.  This is an inelegant and brute-force technique which would probably work, but I would rather do this in a single traversal, and I haven't been able to figure out how to do this.  Can you recommend an approach for doing this kind of action in a single traversal?

Thanks,
Brad


Boxuan Li
 

The approach you proposed should work as good/bad as your original single query in theory. I would be surprised if this approach works better than your original single query, but if so, please let me know.

The optimal approach depends on your data. Since each of your indexes only covers a single index, there are two strategies in general:

1. Leverage one index to return results for one condition, and do in-memory filtering for the other condition. Your original query does this. This is useful in some scenarios (e.g. one index returns many more results than another index).

2. Leverage both indexes and do in-memory intersection to return results that satisfy both conditions. This might be something you want, and it is useful in some scenarios (e.g. both indices return roughly the same results and there is not much overlapping between two sets of results). If this is what you want, try rewriting your query as g.V().has("indexed-prop1", "value1").or(__.has("indexed-prop2", "value2"), __.has("indexed-prop3", "value3")). Essentially, change the union step into "or" step.

Hope this helps!

Best,
Boxuan