Topics

No results returned with duplicate Has steps in a vertex-search traversal


Patrick Streifel
 

We are running into a JanusGraph bug where a traversal that should return a list of vertices is returning an empty list.

 

Here is some background info:

Using a JanusGraph Server with ConfigureGraphFactory running v. 0.5.2.

Storage: Cassandra v. 3.11.9

Index: Elasticsearch v. 6.7.2

Connecting to the server via java gremlin driver.

 

Our use case is this:

 

We are searching for vertices in the graph based on various property filters (e.g. Give me people named "Patrick" with a last name matching the regex "Str.*el"). When we just do this, there are no issues, of course.

 

The tricky part is that we are adding extra filters on a property called DomainGroup, which essentially allows us to filter out results per search user based on what they are interested in seeing. The user running the query provides a list of Domains they are interested in, and there has to be some overlap between the user's Domains and the list of DomainGroups on the vertices for those vertices to be returned. In short, we put in extra "has" steps that filters out vertices in certain groups from the results.

 

Another important note: These "has" steps to filter on Domain occur after each other step in the query. That may not be a great idea for this use case, but we have others where we need it. We have logic that groups together a set of has statements automatically based on user requests.  Sometimes this automated process will duplicate certain property searches when constructing the traversal and it is hard to avoid in certain cases.  We could work to deduplicate, but this still seems like a true bug in JanusGraph, albeit for a weird use case.

 

An example of one of our DomainGroup "has" steps is here:

has(DomainGroup, within([GROUP_A, GROUP_B])),

 

We combed through our DEBUG level logs in the JG Server.

We noticed that JG was querying the Elasticsearch index for results, as expected. Elasticsearch was actually returning the expected vertex(es), but the JG Server was not returning anything after that.

 

Here are some additional conditions we noticed:

  1. This appears only to happen when there are multiple duplicate "has" steps in the traversal.
    1. When we run a traversal with only one property search ( has(FirstName, textRegex(Patric.*)) ) and one DomainGroup filter ( has(DomainGroup , within([GROUP_A, GROUP_B])) ), then we get the expected results.
    2. When we provide two property searches, and thus two (duplicate) DomainGroup filters, we get no results. This leads us to believe there is an issue with having duplicate "has" steps, or specifically duplicate "has" steps with "within" filters.

 

Example of a traversal that we get empty results with:

args={gremlin=[[], [V(),

has(FirstName, textRegex(Patric.*)),

has(DomainGroup , within([GROUP_A, GROUP_B])),

has(PersonSurName, textRegex(Str.*el)),

has(DomainGroup , within([GROUP_A, GROUP_B])),

limit(5), valueMap(), with(~tinkerpop.valueMap.tokens)]], aliases={g=my_graph_traversal}}

 

Logs show the Elastic search scroll request returning a document with the correct id, but the logs also show JG ultimately sending an empty response to our API. Something is lost in between there.

 

Just wanted to bring this to your attention. We are figuring out workarounds on our side, but this seems like a JG bug.


BO XUAN LI
 

Hi,

Can you provide more info on how the fields in your example are indexed? E.g. composite or mixed, what are all indexes involving any of these fields.

「Patrick Streifel <prstreifel@...>」在 2021年1月26日 週二,上午4:56 寫道:

We are running into a JanusGraph bug where a traversal that should return a list of vertices is returning an empty list.

 

Here is some background info:

Using a JanusGraph Server with ConfigureGraphFactory running v. 0.5.2.

Storage: Cassandra v. 3.11.9

Index: Elasticsearch v. 6.7.2

Connecting to the server via java gremlin driver.

 

Our use case is this:

 

We are searching for vertices in the graph based on various property filters (e.g. Give me people named "Patrick" with a last name matching the regex "Str.*el"). When we just do this, there are no issues, of course.

 

The tricky part is that we are adding extra filters on a property called DomainGroup, which essentially allows us to filter out results per search user based on what they are interested in seeing. The user running the query provides a list of Domains they are interested in, and there has to be some overlap between the user's Domains and the list of DomainGroups on the vertices for those vertices to be returned. In short, we put in extra "has" steps that filters out vertices in certain groups from the results.

 

Another important note: These "has" steps to filter on Domain occur after each other step in the query. That may not be a great idea for this use case, but we have others where we need it. We have logic that groups together a set of has statements automatically based on user requests.  Sometimes this automated process will duplicate certain property searches when constructing the traversal and it is hard to avoid in certain cases.  We could work to deduplicate, but this still seems like a true bug in JanusGraph, albeit for a weird use case.

 

An example of one of our DomainGroup "has" steps is here:

has(DomainGroup, within([GROUP_A, GROUP_B])),

 

We combed through our DEBUG level logs in the JG Server.

We noticed that JG was querying the Elasticsearch index for results, as expected. Elasticsearch was actually returning the expected vertex(es), but the JG Server was not returning anything after that.

 

Here are some additional conditions we noticed:

  1. This appears only to happen when there are multiple duplicate "has" steps in the traversal.
    1. When we run a traversal with only one property search ( has(FirstName, textRegex(Patric.*)) ) and one DomainGroup filter ( has(DomainGroup , within([GROUP_A, GROUP_B])) ), then we get the expected results.
    2. When we provide two property searches, and thus two (duplicate) DomainGroup filters, we get no results. This leads us to believe there is an issue with having duplicate "has" steps, or specifically duplicate "has" steps with "within" filters.

 

Example of a traversal that we get empty results with:

args={gremlin=[[], [V(),

has(FirstName, textRegex(Patric.*)),

has(DomainGroup , within([GROUP_A, GROUP_B])),

has(PersonSurName, textRegex(Str.*el)),

has(DomainGroup , within([GROUP_A, GROUP_B])),

limit(5), valueMap(), with(~tinkerpop.valueMap.tokens)]], aliases={g=my_graph_traversal}}

 

Logs show the Elastic search scroll request returning a document with the correct id, but the logs also show JG ultimately sending an empty response to our API. Something is lost in between there.

 

Just wanted to bring this to your attention. We are figuring out workarounds on our side, but this seems like a JG bug.


Patrick Streifel
 

Hi,

We have a mixed Elasticsearch index that indexes every vertex property on our graph. 
For the above example, all of the fields are indexed as a "keyword" string in Elasticsearch. 
We sometimes get inconsistent behavior. For example if instead of querying by two keyword fields (FullName and PersonSurName), I instead query by one field with a "keyword" mapping type and another with a "text" mapping type, I get the desired result back. 
Again, I can see in the logs the correct records being returned by our Elasticsearch index, regardless of the example. It's just not getting returned by JanusGraph after that.
We have no composite indices on our graph. 

Thanks!


BO XUAN LI
 

Just created a simple test case but couldn’t reproduce:

@Test
public void testDuplicateMixedIndexQuery() {
final PropertyKey name = makeKey("name", String.class);
final PropertyKey prop = makeKey("prop", String.class);
mgmt.buildIndex("mixed", Vertex.class).addKey(name, Mapping.STRING.asParameter()).buildMixedIndex(INDEX);
mgmt.buildIndex("mixed2", Vertex.class).addKey(prop, Mapping.STRING.asParameter()).buildMixedIndex(INDEX);
finishSchema();

tx.addVertex("name", "bob", "prop", "val");
tx.commit();

clopen(option(FORCE_INDEX_USAGE), true);
newTx();
assertTrue(tx.traversal().V().has("prop", "val").has("name", P.within("bob","alice")).hasNext());
assertTrue(tx.traversal().V().has("prop", "val").has("name", P.within("bob","alice")).has("name", P.within("bob","alice")).hasNext());
}

Would it be possible for you to narrow down the scope, e.g. removing other “has” steps in your query? It would be helpful if you could write a piece of minimal reproducible test code.

On Feb 4, 2021, at 3:43 AM, Patrick Streifel <prstreifel@...> wrote:

Hi,

We have a mixed Elasticsearch index that indexes every vertex property on our graph. 
For the above example, all of the fields are indexed as a "keyword" string in Elasticsearch. 
We sometimes get inconsistent behavior. For example if instead of querying by two keyword fields (FullName and PersonSurName), I instead query by one field with a "keyword" mapping type and another with a "text" mapping type, I get the desired result back. 
Again, I can see in the logs the correct records being returned by our Elasticsearch index, regardless of the example. It's just not getting returned by JanusGraph after that.
We have no composite indices on our graph. 

Thanks!