Why are Index Queries so much slower than Vertex Lookup?


sofia...@...
 

TLDR: 

I did a small experiment where Index Queries are up to 50x slower than Vertex Lookups.

I want to know if I have something wrong in my setup, or if this is as expected. 


Long Version:

I did a small experiment with Janusgraph (backed by Cassandra), with the goal to evaluate if Janusgraph can be used as the storage layer of my application. 


In this experiment, I try to evaluate Janusgraph performance of querying multiple nodes, using an index, and using the gremlin method P.within.


The query I’m doing is the following:

final Long result = graph.traversal().V()

.has("internal_id", P.within(vertexIds)).count().toList().get(0);


Where the vertexIds is an array of random possible values of the internal_id, for which I’m varying the size. Previously, I’ve configured a Vertex Property Index on this field internal_id. Additionally, I  can see that this index is being used when I query (see: [1] below)


The results I’m seeing are the following:

100 nodes - 214ms

1000 nodes - 1 636ms

10 000 nodes - 36 604ms

100 000 nodes - 281551 ms (almost 5min)


I was not expecting such bad performance, so I did another experience, to see if the problem is my Cassandra setup. This time, instead of querying the indexed property, I’m querying for Vertex Ids directly. In this, I am previously storing the mapping between my Internal Ids and Vertex Ids using RocksDB. 



With this, my previous query was simplified for:

final Long result = graph.traversal()

.V(vertices).count().toList().get(0);


Where Vertices are already Janus’ Vertices ids. 


The results I got were much better, even if I count the time I need to access the Vertex Ids from RocksDB:

100 nodes - 12ms

1000 nodes - 58ms

10 000 nodes - 668ms

100 000 nodes - 63 222ms


Plotting this difference makes the problem much clearer (x-axis are the number of nodes I’m querying; y-axis is the time in ms the system takes to return me results):



My question here is why are these results so different?

Am I missing some configuration, or is there something I should tune for this case?



Additional Info:

I tried to enable query-batch property when opening the graph:

JanusGraphFactory.build()

… (other configuration options)
.set("storage.cql.query.batch", true)

.open();


However, I’m not completely sure that this property is being used by Janusgraph, since I got no improvements from using it (tips on how to check this, are highly appreciated)


My setup: Janusgraph Embedded with 3 Cassandra Nodes (each one on a separate machine)

[1] Profile Query Result:

gremlin> graph.traversal().V().has("internal_id",P.within([1234,212,989,199])).profile()

==>Traversal Metrics

Step                                                           Count  Traversers   Time (ms) % Dur

=============================================================================================================

JanusGraphStep([],[internal_id.within([1234, 21...                12      12     226.852   100.00

\_condition=((internal_id = 1234 OR internal_id = 212 OR internal_id = 989 OR internal_id = 199))

\_orders=[]

\_isFitted=true

\_isOrdered=true

\_query=multiKSQ[4]@2147483647

\_index=idComposite

  optimization                                                                             4.521

  optimization                                                                           132.700

  backend-query                                                   12                  48.379

\_query=idComposite:multiKSQ[4]@2147483647

                                         >TOTAL                 -       -     226.852    -


Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.