Custom analyzer with traversal queries


florent@...
 

Hello everyone,


As it's my first post, I'd first thank you all for your work ! It has been pretty neat to discover all of the work !

We've started to investigate the use of custom analyzers in our schema (-> https://docs.janusgraph.org/v0.5/index-backend/field-mapping/#for-elasticsearch), where ES is used as an indexing backend.

From what I've seen, initial selections will indeed use the index (e.g. `traversal().V().has(...)`). However, it isn't always used, and sometimes the janusgraph implementation of the predicate is used (e.g. when `traversal().V(idx1, idx2, ...).has(...)`). In this latter case, it's the predicates as implemented here that are run. For text fields, it looks like a tokenization and lower-casing are perform by them directly.

I haven't seen mention of this case in the documentation. So, I'm wondering if my observations are valid.

If so, is there a way to force the usage of ES indices? An IDs query could be way to implement such feature (at least for ES).


All the best,
Flo


Boxuan Li
 

Hi Flo,

Elasticsearch Index is only used when your query is a global query (a.k.a. graph-centric query in JanusGraph), i.e. look up vertices with conditions. `traversal.V().has(…)` is a global query.

On the other hand, `traversal().V(idx1, idx2, …).has(…)` is a vertex-centric query rather than global query. Elasticsearch is not involved, thus your custom analyzers are not used.

Your observations are valid. The documentation is not very clear indeed.

Unfortunately I don’t see any workaround here. Probably you could modify Text.java and build your own JanusGraph. You may also want to create a feature request to allow custom analyzers without indexing backends. Probably we could make Text.java pluggable (not sure if that’s possible).

Best regards,
Boxuan

On Jun 17, 2021, at 3:56 PM, florent@... wrote:

Hello everyone,


As it's my first post, I'd first thank you all for your work ! It has been pretty neat to discover all of the work !

We've started to investigate the use of custom analyzers in our schema (-> https://docs.janusgraph.org/v0.5/index-backend/field-mapping/#for-elasticsearch), where ES is used as an indexing backend.

From what I've seen, initial selections will indeed use the index (e.g. `traversal().V().has(...)`). However, it isn't always used, and sometimes the janusgraph implementation of the predicate is used (e.g. when `traversal().V(idx1, idx2, ...).has(...)`). In this latter case, it's the predicates as implemented here that are run. For text fields, it looks like a tokenization and lower-casing are perform by them directly.

I haven't seen mention of this case in the documentation. So, I'm wondering if my observations are valid.

If so, is there a way to force the usage of ES indices? An IDs query could be way to implement such feature (at least for ES).


All the best,
Flo


florent@...
 

Thank you !

Currently, we have only a limited set of features to support from ES, so we are most likely going to wrap the Janus' text predicates.

All the best,
Flo