Multiple or-steps are conflated when using the textRegex() predicate


Mladen Marović
 

Hello!

I came upon some unexpected behavior when running queries with multiple or() steps and string searches on mixed indexes and would like some clarification if this is intended or not.

I have a graph with a vertex label of type person and an edge label of type sent-message-to. All edges have the properties sender and receiver, and some others. A mixed index backed by elasticsearch is created for the edge label. Both sender and receiver are indexed (as STRING types), as well as some others.

The query that's causing me problems is:

g.E() \
    .hasLabel('sent-message-to') \
    .or( \
        has('sender', textRegex('.*alice.*')), \
        has('receiver', textRegex('.*alice.*')) \
    ).or( \
        has('sender', textRegex('.*bob.*')), \
        has('receiver', textRegex('.*bob.*')) \
    ).toList()

The query should return (roughly) messages between alices and bobs (and some edge cases where an alice bobowitz talks to an eve, but that's not important here). However, I'm getting some unexpected results where, for example, neither the sender nor the recipient contain the substring bob.

The explain plan for the query is as follows:

gremlin> g.E() \
......1>     .hasLabel('sent-message-to') \
......2>     .or( \
......3>         has('sender', textRegex('.*alice.*')), \
......4>         has('receiver', textRegex('.*alice.*')) \
......5>     ).or( \
......6>         has('sender', textRegex('.*bob.*')), \
......7>         has('receiver', textRegex('.*bob.*')) \
......8>     ).explain()
==>Traversal Explanation
=======================================================================================================================================================================================================================================================================================================
Original Traversal                          [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]

ConnectiveStrategy                    [D]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
IncidentToAdjacentStrategy            [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
RepeatUnrollStrategy                  [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
MatchPredicateStrategy                [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
PathRetractionStrategy                [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
EarlyLimitStrategy                    [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
FilterRankingStrategy                 [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
InlineFilterStrategy                  [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
AdjacentToIncidentStrategy            [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
CountStrategy                         [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
LazyBarrierStrategy                   [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
AdjacentVertexFilterOptimizerStrategy [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
AdjacentVertexHasIdOptimizerStrategy  [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
AdjacentVertexIsOptimizerStrategy     [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
JanusGraphLocalQueryOptimizerStrategy [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.textRegex(.*alice.*)])], [HasStep([receiver.textRegex(.*alice.*)])]]), OrStep([[HasStep([sender.textRegex(.*bob.*)])], [HasStep([receiver.textRegex(.*bob.*)])]])]
JanusGraphStepStrategy                [P]   [JanusGraphStep([],[~label.eq(sent-message-to)]).Or(JanusGraphStep([],[sender.textRegex(.*alice.*)]),JanusGraphStep([],[receiver.textRegex(.*alice.*)]),JanusGraphStep([],[sender.textRegex(.*bob.*)]),JanusGraphStep([],[receiver.textRegex(.*bob.*)]))]
JanusGraphIoRegistrationStrategy      [P]   [JanusGraphStep([],[~label.eq(sent-message-to)]).Or(JanusGraphStep([],[sender.textRegex(.*alice.*)]),JanusGraphStep([],[receiver.textRegex(.*alice.*)]),JanusGraphStep([],[sender.textRegex(.*bob.*)]),JanusGraphStep([],[receiver.textRegex(.*bob.*)]))]
ProfileStrategy                       [F]   [JanusGraphStep([],[~label.eq(sent-message-to)]).Or(JanusGraphStep([],[sender.textRegex(.*alice.*)]),JanusGraphStep([],[receiver.textRegex(.*alice.*)]),JanusGraphStep([],[sender.textRegex(.*bob.*)]),JanusGraphStep([],[receiver.textRegex(.*bob.*)]))]
StandardVerificationStrategy          [V]   [JanusGraphStep([],[~label.eq(sent-message-to)]).Or(JanusGraphStep([],[sender.textRegex(.*alice.*)]),JanusGraphStep([],[receiver.textRegex(.*alice.*)]),JanusGraphStep([],[sender.textRegex(.*bob.*)]),JanusGraphStep([],[receiver.textRegex(.*bob.*)]))]

Final Traversal                             [JanusGraphStep([],[~label.eq(sent-message-to)]).Or(JanusGraphStep([],[sender.textRegex(.*alice.*)]),JanusGraphStep([],[receiver.textRegex(.*alice.*)]),JanusGraphStep([],[sender.textRegex(.*bob.*)]),JanusGraphStep([],[receiver.textRegex(.*bob.*)]))]

The final (seemingly incorrect) traversal explains the results I'm getting. However, what strikes me as odd is that the JanusGraphLocalQueryOptimizerStrategy seems to return a correct traversal with two separate or steps:

[
	GraphStep(edge,[]),
	HasStep([~label.eq(sent-message-to)]),
	OrStep([
		[HasStep([sender.textRegex(.*alice.*)])],
		[HasStep([receiver.textRegex(.*alice.*)])]
	]),
	OrStep([
		[HasStep([sender.textRegex(.*bob.*)])],
		[HasStep([receiver.textRegex(.*bob.*)])]
	])
]

but the following JanusGraphStepStrategy conflates the two or steps into a single one:

[
	JanusGraphStep(
		[],[~label.eq(sent-message-to)]
	)
	.Or(
		JanusGraphStep([],[sender.eq(alice)]),
		JanusGraphStep([],[receiver.eq(alice)]),
		JanusGraphStep([],[sender.eq(bob)]),
		JanusGraphStep([],[receiver.eq(bob)])
	)
]

, which should not be correct because (A or B) and (C or D) is not equal to (A or B or C or D).

What's more confusing is that if I replace the textRegex() predicate with the tinkerpop predicate containing(), I get the proper results, because the explain plan is different:

gremlin> g.E() \
......1>     .hasLabel('sent-message-to') \
......2>     .or( \
......3>         has('sender', containing('alice')), \
......4>         has('receiver', containing('alice')) \
......5>     ).or( \
......6>         has('sender', containing('bob')), \
......7>         has('receiver', containing('bob')) \
......8>     ).explain()
==>Traversal Explanation
=======================================================================================================================================================================================================================================================================================
Original Traversal                          [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]

ConnectiveStrategy                    [D]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
IncidentToAdjacentStrategy            [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
RepeatUnrollStrategy                  [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
MatchPredicateStrategy                [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
PathRetractionStrategy                [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
EarlyLimitStrategy                    [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
FilterRankingStrategy                 [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
InlineFilterStrategy                  [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
AdjacentToIncidentStrategy            [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
CountStrategy                         [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
LazyBarrierStrategy                   [O]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
AdjacentVertexFilterOptimizerStrategy [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
AdjacentVertexHasIdOptimizerStrategy  [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
AdjacentVertexIsOptimizerStrategy     [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
JanusGraphLocalQueryOptimizerStrategy [P]   [GraphStep(edge,[]), HasStep([~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
JanusGraphStepStrategy                [P]   [JanusGraphStep([],[~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
JanusGraphIoRegistrationStrategy      [P]   [JanusGraphStep([],[~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
ProfileStrategy                       [F]   [JanusGraphStep([],[~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]
StandardVerificationStrategy          [V]   [JanusGraphStep([],[~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]

Final Traversal                             [JanusGraphStep([],[~label.eq(sent-message-to)]), OrStep([[HasStep([sender.containing(alice)])], [HasStep([receiver.containing(alice)])]]), OrStep([[HasStep([sender.containing(bob)])], [HasStep([receiver.containing(bob)])]])]

and the final traversal contains two or steps:

[
	JanusGraphStep([],[~label.eq(sent-message-to)]),
	OrStep([
		[HasStep([sender.containing(alice)])],
		[HasStep([receiver.containing(alice)])]
	]),
	OrStep([
		[HasStep([sender.containing(bob)])],
		[HasStep([receiver.containing(bob)])]
	])]

I'd like to use the underlying mixed index to fetch the results, and preferably only one index query should be performed under the hood.

Is there a way to force this query to properly use the mixed index? Why is the explain plan in these two cases different?

Kind regards,

Mladen Marović

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.