Potential transaction issue (JG 0.6.0)


sergeymetallic@...
 
Edited

The issue can be reproduced within certain conditions, I cannot find a recipe to reproduce it on any environment
We have a query of type 
g.inject((int) 1).union(...).limit(5L)

We have several subqueries in "union" that return a large amount of data. While executing this query we get an error 
"java.lang.NullPointerException: null
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.lambda$new$6(StandardJanusGraphTx.java:1478)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:199)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:75)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:37)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:71)
at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:70)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:29)
at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:50)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:41)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:27)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiIterator.computeNext(MultiIterator.java:42)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:48)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:26)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep$EndStep.processNextStart(ComputerAwareStep.java:82)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:44)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.DedupGlobalStep.processNextStart(DedupGlobalStep.java:107)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor.java:222)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I was able to fix the problem by checking if transaction is open
public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}
It was working fine in Janugraph 0.5.3

Important part is that I use "limit" here, without it everything works just fine. Maybe transaction is closing earlier than needed? Also I use remote graph via websocket

Any ideas what there might be wrong?


sergeymetallic@...
 

Looking at the changes in 0.6.0 I think this problem was just hidden in the previous version as resources were not released properly    

private void releaseTransaction() {
        -//TODO: release non crucial data structures to preserve memory?
        isOpen = false;
        graph.closeTransaction(this);
        -vertexCache.close();
        +vertexCache = null;
        +indexCache = null;
        +addedRelations = null;
        +deletedRelations = null;
        +uniqueLocks = null;
        +newVertexIndexEntries = null;
        +newTypeCache = null;
    }


Charles
 

I have also encountered this problem, but I have found a reliable way to reproduce.

The following query works perfectly in the gremlin console (both server and client from Janusgraph 0.6.0 distribution)

gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(10).elementMap("workerId")
gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(0).limit(10).elementMap("workerId")

In Java this query fails with the same exception as in your trace, if I pass offset = 0 (zero) it seems to work sometimes fiddling with the offset (3,5,10 and so on)
 
return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);

To make the query succeed I have to either remove the skip() and limit() or order the results before skipping and limiting i.e.

return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.order()
.by(WorkerWrapper.PROP_LAST_NAME).by(WorkerWrapper.PROP_FIRST_NAME)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);
 
Quite a number of my queries depends on this style of code, start with a known node, traverse edges, has clause and then skip and limit.

9674 [gremlin-server-exec-2] WARN  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Exception processing a Traversal on iteration for request [6be9a75f-fa5d-4ef1-a61e-7d133bca33c8].
2021-10-15T21:32:09.932504000Z java.lang.NullPointerException
2021-10-15T21:32:09.932569000Z at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
2021-10-15T21:32:09.932611000Z at org.janusgraph.graphdb.query.vertex.VertexLongList.get(VertexLongList.java:72)
2021-10-15T21:32:09.932652000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:144)
2021-10-15T21:32:09.932697000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:131)
2021-10-15T21:32:09.932734000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:45)
2021-10-15T21:32:09.932774000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932812000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932852000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932890000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932927000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932961000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932995000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933033000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.933068000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
2021-10-15T21:32:09.933109000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933143000Z at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
2021-10-15T21:32:09.933188000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
2021-10-15T21:32:09.933228000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
2021-10-15T21:32:09.933265000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
2021-10-15T21:32:09.933299000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor
 
 


Boxuan Li
 

I agree with Sergey that "this problem was just hidden in the previous version as resources were not released properly".

I tried to reproduce in Java (not remote graph) but failed. @Charles, are you able to release the complete recipe of your code, or spot anything that I am missing?

My code is as follows (you can put it in JanusGraphTest.java and run):

@Test
public void testTransactionIssue() {
JanusGraphVertex v1 = tx.addVertex(T.label, "company", "companyId", 44507);
JanusGraphVertex v2 = tx.addVertex("status", "APPROVED", "workerId", 123, "lastName", "A", "firstName", "aa");
JanusGraphVertex v3 = tx.addVertex("status", "APPROVED", "workerId", 124, "lastName", "C", "firstName", "a");
JanusGraphVertex v4 = tx.addVertex("status", "APPROVED", "workerId", 125, "lastName", "aa", "firstName", "C");
v1.addEdge("EMPLOYS", v2);
v1.addEdge("EMPLOYS", v3);
v1.addEdge("EMPLOYS", v4);
tx.commit();
newTx();
List list = tx.traversal().V().has("company", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(1).elementMap("workerId").toList();
tx.traversal().V().has("company", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(1).limit(1).elementMap("workerId").toList();
tx.traversal().V().has("company", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(2).elementMap("workerId").toList();
tx.rollback();
tx = graph.newTransaction();
tx.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(1).elementMap("workerId").toList();
tx.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(1).limit(1).elementMap("workerId").toList();


graph.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(1).elementMap("workerId").toList();
graph.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(1).limit(1).elementMap("workerId").toList();
}


Boxuan Li
 

I guess this has something to do with race conditions. Although I couldn't reproduce the exact issue, I found a similar race condition that caused an NPE in `expireSchemaElement` method (https://github.com/JanusGraph/janusgraph/issues/2898).  The fix is here: https://github.com/JanusGraph/janusgraph/pull/2899 which will be released in the next minor version (0.6.1). This PR also includes the temporary fix proposed by Sergey:

public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}

If anyone is able to find a steady way to reproduce this issue and/or encounter a similar NPE issue somewhere else, please let me know, thanks! A code review is also very welcome.

Best,
Boxuan


Umesh Gade
 

We faced below NPE with JG 0.6.0 while creating new vertex. Not sure if this is related but never seen this exception with earlier JG versions. 

java.lang.NullPointerException: null
        at org.janusgraph.graphdb.types.VertexLabelVertex.isPartitioned(VertexLabelVertex.java:41) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.types.VertexLabelVertex.hasDefaultConfiguration(VertexLabelVertex.java:67) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:579) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:127) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:143) ~[janusgraph-core-0.6.0.jar:?]