Character case behaviour different with or without indices


ni...@...
 

This is odd behaviour in janusgraph.

If we have a user with a property, say Robert, we can't find him with a search for "Rob" but we can with a search for "rob".
 
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@793cef95
i
= mgmt.getGraphIndex('userStringKey')
==>userStringKey
gremlin
> i.getProperties()
==>unique=false
==>indexedElement=interface org.janusgraph.core.JanusGraphVertex
==>class=class org.janusgraph.graphdb.database.management.JanusGraphIndexWrapper
==>fieldKeys=[Lorg.janusgraph.core.PropertyKey;@72110818
==>mixedIndex=true
==>baseIndex=userStringKey
==>backingIndex=search
==>compositeIndex=false


gremlin
> g.V().has('user', textRegex( '.*Rob.*')).properties()
gremlin
> g.V().has('user', textRegex( '.*rob.*')).properties()
==>vp[user->Robert]
==>vp[v_key->0G4WNqlqyv-J1PTmYnF2]
==>vp[type->user]
gremlin
> g.V().has('user', textRegex( 'robert')).properties()
==>vp[user->Robert]
==>vp[v_key->0G4WNqlqyv-J1PTmYnF2]
==>vp[type->user]
gremlin
> g.V().has('user', textRegex( 'Robert')).properties()
gremlin
> g.V().has('user', textRegex( '.obert')).properties()
==>vp[user->Robert]
==>vp[v_key->0G4WNqlqyv-J1PTmYnF2]
==>vp[type->user]



If there is no index, the behaviour is different
 
graph = JanusGraphFactory.open('/tmp/tmp.prop')
g
= graph.traversal()
v1
= graph.addVertex('name', '1')
v2
= graph.addVertex('name', '2')
edges
= ['xxx', 'xxx.yyy', 'Xxx.yyy', '111.222', 'abcdef'].collect {
    v1
.addEdge('relates', v2, 'p', it)
}
gremlin
> g.E().has('p', textRegex( '.*X.*')).properties()
==>p[p->Xxx.yyy]
gremlin
> g.E().has('p', textRegex( '.*Xxx.*')).properties()
==>p[p->Xxx.yyy]
gremlin
> g.E().has('p', textRegex( '.*xxx.*')).properties()
==>p[p->xxx]
==>p[p->xxx.yyy]

Is this a bug, or is there some subtlety I missed?


ni...@...
 

I should point out that I am using a build from master 0.2.0. 


tpr...@...
 

which indexing backend do you use ?


Le lundi 5 juin 2017 18:53:40 UTC+2, Nigel Brown a écrit :
I should point out that I am using a build from master 0.2.0. 


Nigel Brown <ni...@...>
 

Elasticsearch 5.1.1


tpr...@...
 

Are you using DEFAULT mapping or no Mapping ?
If so, DEFAULT mapping or no mapping use TEXT mapping which are bind to Text ES datatype (https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) which are lowercased by default.

So, if you use STRING mapping which is bind to Keyword ES datatype which are untouched by default, it should work.


Le mardi 6 juin 2017 23:24:46 UTC+2, Nigel Brown a écrit :
Elasticsearch 5.1.1


ni...@...
 

Great.

I was using STRING and TEXT indices and janus was only finding the TEXT one. I stopped using the TEXT index and it started to behave as expected. 
I had assumed that janus would use the right index based on the query (textRegex vs textContainsRegex).

Thanks for your help.


tpr...@...
 

If you want to use both, you have to use TEXTSTRING mapping.


ni...@...
 

Ok, thanks. That is good to know.