Re: Centric Indexes failing to support all conditions for better performance.
BO XUAN LI <libo...@...>
Hi Christopher,toggle quoted messageShow quoted text
isFitted = true basically means no in-memory filtering is needed. If you see isFitted = false, it does not necessarily mean vertex-centric indexes are not used. It could be the case that some vertex-centric index is used, but further in-memory filtering is still needed.
If you see isFitted = false, it does not necessarily mean any index is used. It could be the case that you are fetching all edges of a given vertex.
I totally understand your confusion because the documentation does not explain how the vertex-centric index is built. In JanusGraph, vertices and edges are stored in the “edgestore” store, while composite indexes are stored in the “graphindex” store. Mixed indexes
are stored in external index store like Elasticsearch. This might be a bit counter-intuitive, but vertex-centric indexes are stored in the “edgestore” store. Recall how edges are stored (https://docs.janusgraph.org/advanced-topics/data-model/#individual-edge-layout):
Roughly speaking, If you don’t have any vertex-centric index, then your edge is stored once for one endpoint. If you have one vertex-centric index, then applicable edges are stored twice. If you have two vertex-centric indexes, then applicable edges are stored
three times… These edges, although seemingly duplicate, have different “sort key”s which conform to corresponding vertex-centric indexes. Let’s say you have built an “battlesByRating” vertex-centric index based on the property “rating”, then apart from the
ordinary edge, JanusGraph creates an additional edge whose “sort key” is the rating value. Because the “column” is sorted in the underlying data storage (e.g. “column” in JanusGraph model is mapped to “clustering column” in Cassandra), you essentially gain
the ability to search an index by “rating” value/range.
What happens when your vertex-centric index has two properties like the following?
> mgmt.buildEdgeIndex(battled, 'battlesByRatingAndTime', Direction.OUT, Order.asc, rating, time)
Now your “sort key” is a combination of “rating” and “time” (note “rating” comes before “time”). Under this vertex-centric index, “sort key”s look like this:
(rating=1, time=2), (rating=1, time=3), (rating=2, time=1), (rating=2, time=5), (rating=4, time=2), …
This explains why isFitted = true when your query is has('rating', 5.0).has('time', inside(10, 50)) but not when your query is has(’time', 5.0).has(‘rating', inside(10, 50)).Again, note that isFitted = false does not necessarily
mean your query is not optimized by vertex-centric index. I think the profiler shall be improved to state whether and which vertex-centric index is used.
I am not quite sure about the case b) you mentioned. Seems it’s a design consideration but right now I cannot tell why it is there.
“hasNot" almost never uses indexes because JanusGraph cannot index something that does not exist. (Note that “null” value is not valid in JanusGraph).
Hope this helps.