Date
1 - 7 of 7
P.neq() predicate uses wrong ES mapping
sergeymetallic@...
Janusgraph setup:
Storage backend: Scylla 3 Indexing backend: Elasticsearch 6 JG version: 0.5.3 Steps to reproduce: 1) Create a vertex with two fields mapped in ES index as TEXTSTRING("x" and "y") 2) Insert a node with values: x="anyvalue", y="??" 3) Execute these queries:
Expected result:
Actual result:
Observation: Looks like the issue is in this line https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/ElasticSearchIndex.java#L959 Code checks for Cmp.EQUAL but not for Cmp.NOT_EQUAL, so that in case of NOT_EQUAL tokenized field is used |
|
hadoopmarc@...
Hi Sergey,
I think I see your point, but for completeness can you be explicit on step 1) and specify your mgmt.buildIndex() statements? Best wishes, Marc |
|
sergeymetallic@...
Hi Marc,
something like this var index = janusGraphManagement. |
|
hadoopmarc@...
Hi Sergej,
The example string "??" you used was not an ordinary string. Apparently, somewhere in elasticsearch it is interpreted as a wildcard. See my transcript below with some other property value and the index behaves according to your and my expectations. I made some attempts to escape the question marks in your example string like "\\?, but was not successful. The janusgraph documentation is very quiet on the use of wildcards for indexing backends. Best wishes, Marc bin/janusgraph.sh start bin/gremlin.sh graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties') mgmt = graph.openManagement() index = mgmt.buildIndex("indexname", Vertex.class) xproperty = mgmt.makePropertyKey("x").dataType(String.class).make(); yproperty = mgmt.makePropertyKey("y").dataType(String.class).make(); index.addKey(xproperty, Mapping.TEXTSTRING.asParameter()) index.addKey(yproperty, Mapping.TEXTSTRING.asParameter()) index.buildMixedIndex("search") mgmt.commit() ManagementSystem.awaitGraphIndexStatus(graph, 'indexname').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call() ==>GraphIndexStatusReport[success=true, indexName='indexname', targetStatus=[REGISTERED, ENABLED], notConverged={}, converged={x=ENABLED, y=ENABLED}, elapsed=PT0.017S] g = graph.traversal() g.addV('Some').property('x', 'x1').property('y', 'y1') g.addV('Some').property('x', 'x2').property('y', '??') g.tx().commit() Expected behaviour: g.V().has("x","x1").has("y",P.neq("y1")) ===> g.V().has("x","x1").has("y",P.eq("y1")) ==>v[4224] g.V().has("x","x1").has("y",P.neq("y4")) ==>v[4224] Undocumented behaviour: g.V().has("x","x2").has("y",P.neq("??")) ==>v[4264] g.V().has("x","x2").has("y",P.eq("??")) ==>v[4264] g.V().has("x","x2").has("y",P.neq("y4")) ==>v[4264] |
|
sergeymetallic@...
Hi Marc,
problem is that Janusgraph uses tokenised field for "neq" comparisons and non tokenised for "eq". For example for a property "x" tokenised field in ES will have the name "x" and non-tokenized "x_STRING". Instead of "??" there can be any value that contains a space (like "this is a simple text") and it will not work already |
|
hadoopmarc@...
Hi Sergey,
The mere mortals skimming over the questions in this forum often need very explicit examples to fully grasp a point. The transcript below, expanding on the earlier one above, shows the exact consequence of your statement 'problem is that Janusgraph uses tokenised field for "neq" comparisons and non tokenised for "eq". ' According to the ref docs the eq(), neq(), textPrefix(), textRegex() and textFuzzy() predicates should apply to STRING search (so to the non-tokenized field). gremlin> g.addV('Some').property('x','watch the dog') ==>v[4192] gremlin> g.tx().commit() ==>null gremlin> g.V().elementMap() 10:03:40 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes ==>[id:4192,label:Some,x:watch the dog] ==>[id:4264,label:Some,x:x2,y:??] ==>[id:4224,label:Some,x:x1,y:y1] gremlin> g.V().has('x', eq('watch')).elementMap() gremlin> gremlin> g.V().has('x', eq('watch the dog')).elementMap() ==>[id:4192,label:Some,x:watch the dog] gremlin> g.V().has('x', neq('watch the dog')).elementMap() ==>[id:4264,label:Some,x:x2,y:??] ==>[id:4224,label:Some,x:x1,y:y1] gremlin> g.V().has('x', neq('watch')).elementMap() ==>[id:4264,label:Some,x:x2,y:??] ==>[id:4224,label:Some,x:x1,y:y1] // Here, ==>[id:4192,label:Some,x:watch the dog] is missing, supporting Sergey's issue!!! Related to this, there does not exist a negation for the textContains() predicate for full TEXT search. Using the TextP.notContaining() TinkerPop generic predicate, causes JanusGraph to not use the index.I will post an issue on github referring to this thread. Best wishes, Marc |
|
hadoopmarc@...
https://github.com/JanusGraph/janusgraph/issues/2588
toggle quoted message
Show quoted text
For further explicitness I added the following example: gremlin> g.V().has('x', neq('lion')).elementMap() ==>[id:4264,label:Some,x:x2,y:??] ==>[id:4224,label:Some,x:x1,y:y1] ==>[id:4192,label:Some,x:watch the dog] On Sun, Apr 25, 2021 at 09:42 AM, <hadoopmarc@...> wrote:
gremlin> g.V().has('x', neq('watch the dog')).elementMap() |
|