Potential transaction issue (JG 0.6.0)


Boxuan Li
 

Hi Umesh,

What you reported might be due to a different cause. Are you able to reproduce it steadily, and if so, could you please share the steps to reproduce the problem? It would be great if you could create a new thread for the problem you reported, thanks!

Best,
Boxuan

On Thu, Jan 27, 2022 at 9:55 PM Umesh Gade <er.umeshgade@...> wrote:
We faced below NPE with JG 0.6.0 while creating new vertex. Not sure if this is related but never seen this exception with earlier JG versions. 

java.lang.NullPointerException: null
        at org.janusgraph.graphdb.types.VertexLabelVertex.isPartitioned(VertexLabelVertex.java:41) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.types.VertexLabelVertex.hasDefaultConfiguration(VertexLabelVertex.java:67) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:579) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:127) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:143) ~[janusgraph-core-0.6.0.jar:?]


Umesh Gade
 

We faced below NPE with JG 0.6.0 while creating new vertex. Not sure if this is related but never seen this exception with earlier JG versions. 

java.lang.NullPointerException: null
        at org.janusgraph.graphdb.types.VertexLabelVertex.isPartitioned(VertexLabelVertex.java:41) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.types.VertexLabelVertex.hasDefaultConfiguration(VertexLabelVertex.java:67) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:579) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:127) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:143) ~[janusgraph-core-0.6.0.jar:?]


Boxuan Li
 

I guess this has something to do with race conditions. Although I couldn't reproduce the exact issue, I found a similar race condition that caused an NPE in `expireSchemaElement` method (https://github.com/JanusGraph/janusgraph/issues/2898).  The fix is here: https://github.com/JanusGraph/janusgraph/pull/2899 which will be released in the next minor version (0.6.1). This PR also includes the temporary fix proposed by Sergey:

public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}

If anyone is able to find a steady way to reproduce this issue and/or encounter a similar NPE issue somewhere else, please let me know, thanks! A code review is also very welcome.

Best,
Boxuan


Boxuan Li
 

I agree with Sergey that "this problem was just hidden in the previous version as resources were not released properly".

I tried to reproduce in Java (not remote graph) but failed. @Charles, are you able to release the complete recipe of your code, or spot anything that I am missing?

My code is as follows (you can put it in JanusGraphTest.java and run):

@Test
public void testTransactionIssue() {
JanusGraphVertex v1 = tx.addVertex(T.label, "company", "companyId", 44507);
JanusGraphVertex v2 = tx.addVertex("status", "APPROVED", "workerId", 123, "lastName", "A", "firstName", "aa");
JanusGraphVertex v3 = tx.addVertex("status", "APPROVED", "workerId", 124, "lastName", "C", "firstName", "a");
JanusGraphVertex v4 = tx.addVertex("status", "APPROVED", "workerId", 125, "lastName", "aa", "firstName", "C");
v1.addEdge("EMPLOYS", v2);
v1.addEdge("EMPLOYS", v3);
v1.addEdge("EMPLOYS", v4);
tx.commit();
newTx();
List list = tx.traversal().V().has("company", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(1).elementMap("workerId").toList();
tx.traversal().V().has("company", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(1).limit(1).elementMap("workerId").toList();
tx.traversal().V().has("company", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(2).elementMap("workerId").toList();
tx.rollback();
tx = graph.newTransaction();
tx.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(1).elementMap("workerId").toList();
tx.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(1).limit(1).elementMap("workerId").toList();


graph.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(1).elementMap("workerId").toList();
graph.traversal().V().has("companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(1).limit(1).elementMap("workerId").toList();
}


Charles <dostanian@...>
 

I have also encountered this problem, but I have found a reliable way to reproduce.

The following query works perfectly in the gremlin console (both server and client from Janusgraph 0.6.0 distribution)

gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(10).elementMap("workerId")
gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(0).limit(10).elementMap("workerId")

In Java this query fails with the same exception as in your trace, if I pass offset = 0 (zero) it seems to work sometimes fiddling with the offset (3,5,10 and so on)
 
return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);

To make the query succeed I have to either remove the skip() and limit() or order the results before skipping and limiting i.e.

return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.order()
.by(WorkerWrapper.PROP_LAST_NAME).by(WorkerWrapper.PROP_FIRST_NAME)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);
 
Quite a number of my queries depends on this style of code, start with a known node, traverse edges, has clause and then skip and limit.

9674 [gremlin-server-exec-2] WARN  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Exception processing a Traversal on iteration for request [6be9a75f-fa5d-4ef1-a61e-7d133bca33c8].
2021-10-15T21:32:09.932504000Z java.lang.NullPointerException
2021-10-15T21:32:09.932569000Z at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
2021-10-15T21:32:09.932611000Z at org.janusgraph.graphdb.query.vertex.VertexLongList.get(VertexLongList.java:72)
2021-10-15T21:32:09.932652000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:144)
2021-10-15T21:32:09.932697000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:131)
2021-10-15T21:32:09.932734000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:45)
2021-10-15T21:32:09.932774000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932812000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932852000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932890000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932927000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932961000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932995000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933033000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.933068000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
2021-10-15T21:32:09.933109000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933143000Z at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
2021-10-15T21:32:09.933188000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
2021-10-15T21:32:09.933228000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
2021-10-15T21:32:09.933265000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
2021-10-15T21:32:09.933299000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor
 
 


sergeymetallic@...
 

Looking at the changes in 0.6.0 I think this problem was just hidden in the previous version as resources were not released properly    

private void releaseTransaction() {
        -//TODO: release non crucial data structures to preserve memory?
        isOpen = false;
        graph.closeTransaction(this);
        -vertexCache.close();
        +vertexCache = null;
        +indexCache = null;
        +addedRelations = null;
        +deletedRelations = null;
        +uniqueLocks = null;
        +newVertexIndexEntries = null;
        +newTypeCache = null;
    }


sergeymetallic@...
 
Edited

The issue can be reproduced within certain conditions, I cannot find a recipe to reproduce it on any environment
We have a query of type 
g.inject((int) 1).union(...).limit(5L)

We have several subqueries in "union" that return a large amount of data. While executing this query we get an error 
"java.lang.NullPointerException: null
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.lambda$new$6(StandardJanusGraphTx.java:1478)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:199)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:75)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:37)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:71)
at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:70)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:29)
at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:50)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:41)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:27)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiIterator.computeNext(MultiIterator.java:42)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:48)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:26)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep$EndStep.processNextStart(ComputerAwareStep.java:82)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:44)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.DedupGlobalStep.processNextStart(DedupGlobalStep.java:107)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor.java:222)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I was able to fix the problem by checking if transaction is open
public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}
It was working fine in Janugraph 0.5.3

Important part is that I use "limit" here, without it everything works just fine. Maybe transaction is closing earlier than needed? Also I use remote graph via websocket

Any ideas what there might be wrong?