P.neq() predicate uses wrong ES mapping


sergeymetallic@...
 

Janusgraph setup:
Storage backend: Scylla 3
Indexing backend: Elasticsearch 6
JG version: 0.5.3

Steps to reproduce:

1) Create a vertex with two fields mapped in ES index as TEXTSTRING("x" and "y")
2) Insert a node with values: x="anyvalue", y="??"
3) Execute these queries:
  • g.V().has("x","anyvalue").has("y",P.neq("??"))
  • g.V().has("x","anyvalue").has("y",P.eq("??"))

Expected result:
  • First query returns an empty set
  • Second query returns one node

Actual result:
  • Both queries return the same result


Observation:
Looks like the issue is in this line https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/ElasticSearchIndex.java#L959
Code checks for Cmp.EQUAL but not for Cmp.NOT_EQUAL, so that in case of NOT_EQUAL tokenized field is used


hadoopmarc@...
 

Hi Sergey,

I think I see your point, but for completeness can you be explicit on step 1) and specify your mgmt.buildIndex() statements?

Best wishes, 

Marc


sergeymetallic@...
 

Hi Marc,

something like this

var index  = janusGraphManagement.
buildIndex(
"indexname", org.apache.tinkerpop.gremlin.structure.Vertex.class)
var xproperty = janusGraphManagement.makePropertyKey("x").dataType(String.class).make();
var yproperty = janusGraphManagement.makePropertyKey("y").dataType(String.class)
.make();

index.addKey(xproperty, Mapping.TEXTSTRING.asParameter())
index.addKey(yproperty, Mapping.TEXTSTRING.asParameter())
 


hadoopmarc@...
 

Hi Sergej,

The example string "??" you used was not an ordinary string. Apparently, somewhere in elasticsearch it is interpreted as a wildcard.  See my transcript below with some other property value and the index behaves according to your and my expectations. I made some attempts to escape the question marks in your example string like "\\?, but was not successful. The janusgraph documentation is very quiet on the use of wildcards for indexing backends.

Best wishes,   Marc

bin/janusgraph.sh start
bin/gremlin.sh
graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')

mgmt = graph.openManagement()
index = mgmt.buildIndex("indexname", Vertex.class)
xproperty = mgmt.makePropertyKey("x").dataType(String.class).make();
yproperty = mgmt.makePropertyKey("y").dataType(String.class).make();
index.addKey(xproperty, Mapping.TEXTSTRING.asParameter())
index.addKey(yproperty, Mapping.TEXTSTRING.asParameter())
index.buildMixedIndex("search")
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'indexname').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
==>GraphIndexStatusReport[success=true, indexName='indexname', targetStatus=[REGISTERED, ENABLED], notConverged={}, converged={x=ENABLED, y=ENABLED}, elapsed=PT0.017S]

g = graph.traversal()
g.addV('Some').property('x', 'x1').property('y', 'y1')
g.addV('Some').property('x', 'x2').property('y', '??')
g.tx().commit()


Expected behaviour:
g.V().has("x","x1").has("y",P.neq("y1"))
===>
g.V().has("x","x1").has("y",P.eq("y1"))
==>v[4224]
g.V().has("x","x1").has("y",P.neq("y4"))
==>v[4224]

Undocumented behaviour:
g.V().has("x","x2").has("y",P.neq("??"))
==>v[4264]
g.V().has("x","x2").has("y",P.eq("??"))
==>v[4264]
g.V().has("x","x2").has("y",P.neq("y4"))
==>v[4264]


sergeymetallic@...
 

Hi Marc, 

problem is that Janusgraph uses tokenised field for "neq" comparisons and non tokenised for "eq". For example for a property "x" tokenised field in ES will have the name "x" and non-tokenized "x_STRING". Instead of "??" there can be any value that contains a space (like "this is a simple text") and it will not work already


hadoopmarc@...
 

Hi Sergey,

The mere mortals skimming over the questions in this forum often need very explicit examples to fully grasp a point. The transcript below, expanding on the earlier one above, shows the exact consequence of your statement 'problem is that Janusgraph uses tokenised field for "neq" comparisons and non tokenised for "eq". '

According to the ref docs the eq(), neq(), textPrefix(), textRegex() and textFuzzy() predicates should apply to STRING search (so to the non-tokenized field).

gremlin> g.addV('Some').property('x','watch the dog')
==>v[4192]
gremlin> g.tx().commit()
==>null
gremlin> g.V().elementMap()
10:03:40 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[id:4192,label:Some,x:watch the dog]
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]

gremlin> g.V().has('x', eq('watch')).elementMap()
gremlin>
gremlin> g.V().has('x', eq('watch the dog')).elementMap()
==>[id:4192,label:Some,x:watch the dog]

gremlin> g.V().has('x', neq('watch the dog')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]

gremlin> g.V().has('x', neq('watch')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]
// Here, ==>[id:4192,label:Some,x:watch the dog] is missing, supporting Sergey's issue!!!

Related to this, there does not exist a negation for the textContains() predicate for full TEXT search. Using the TextP.notContaining()TinkerPop generic predicate, causes JanusGraph to not use the index.

I will post an issue on github referring to this thread.

Best wishes,   Marc


hadoopmarc@...
 

https://github.com/JanusGraph/janusgraph/issues/2588

For further explicitness I added the following example:

gremlin> g.V().has('x', neq('lion')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]
==>[id:4192,label:Some,x:watch the dog]


On Sun, Apr 25, 2021 at 09:42 AM, <hadoopmarc@...> wrote:
gremlin> g.V().has('x', neq('watch the dog')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]

gremlin> g.V().has('x', neq('watch')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]
// Here, ==>[id:4192,label:Some,x:watch the dog] is missing, supporting Sergey's issue!!!