Java Heap Space - Vertex.edges


Joe Obernberger
 

Thank you Marc and Boxuan - I tried using vertex cut, but it appears to lead toward graph corruption (ie zombie nodes - see thread on 'graph corruption').  Agree that I need a new approach.  This proxy approach is hurting my brain, but I think it will work for what I want to do.  Awesome reference - thank you Boxuan!

-Joe

On 8/31/2022 12:46 AM, Boxuan Li wrote:
Hi Joe,

Unfortunately, JanusGraph does not support super node (a vertex with millions of neighbors) well. And yes, you probably need to remodel your graph. Alternatively, there is a POC that allows you to create proxy nodes when a node becomes a super node: https://github.com/JanusGraph/janusgraph/discussions/2717 but it also requires that you write some custom client code.

Let me know if you have any questions.

Best,
Boxuan

Virus-free.www.avg.com


Boxuan Li
 

Also, your observation is correct that `inE().toList();` can lead to OOM simply because `outE()` size is very large. As Marc pointed out, in this case you have to specify the label `inE('your-label')` to avoid OOM. Of course, this does not solve the real problem because you anyways would have a problem when you need to traverse `outE()`. You either need to revisit your data model, or try out the POC I mentioned in the previous comment.


Boxuan Li
 

Hi Joe,

Unfortunately, JanusGraph does not support super node (a vertex with millions of neighbors) well. And yes, you probably need to remodel your graph. Alternatively, there is a POC that allows you to create proxy nodes when a node becomes a super node: https://github.com/JanusGraph/janusgraph/discussions/2717 but it also requires that you write some custom client code.

Let me know if you have any questions.

Best,
Boxuan


hadoopmarc@...
 

Hi Joe,

The section in the blog "When Predicate Pushdown Fails" shows that you definitely have to include the edge label, so: .inE('your-label')

Best wishes,   Marc


Joe Obernberger
 

Thank you Marc - I tried something like this:
List<Edge> edgeList = traversal.V().has("myId", myId).inE().toList();

myId is an indexed field.  This also runs out of memory if the outE() size is very large.  At least that's what appears to be happening.  The size of the inE is small (less than 10 nodes).  The outE size can be very large.  It seems that we can't have graphs with a large number of edges on a single node.  The graph can also result in very large partition sizes in Cassandra (in my case about 650MBytes):  Seeing this; would I need to redesign the graph?  What I'm trying now is limiting the number of edges on a node, but that seems opposed to why I'm using a graph in the first place.

nodetool tablehistograms graphsource.edgestore
graphsource/edgestore histograms
Percentile      Read Latency     Write Latency          SSTables    Partition Size        Cell Count
                    (micros)          (micros)                             (bytes)
50%                   379.02             35.43              8.00               372                 5
75%                   379.02             42.51              8.00               535                 6
95%                   545.79             51.01              8.00               535                 8
98%                   545.79             51.01              8.00              1109                10
99%                   545.79             51.01              8.00              1916                14
Min                   219.34             20.50              6.00               125                 0
Max                   545.79             51.01              8.00         668489532           4866323

-Joe

On 8/28/2022 4:50 AM, hadoopmarc@... wrote:
Hi Joe,

Can you take a look at this blog from Boxuan Li:  https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8

In general, it is better to use the TinkerPop API starting with g.V(). This makes sure you do not skip query optimizations. In addition, it makes your code more portable with respect to other TinkerPop-compatible graph systems.

I am not sure though if this will really help for your use case, but give it a try!

Best wishes,     Marc

Virus-free.www.avg.com


hadoopmarc@...
 

Hi Joe,

Can you take a look at this blog from Boxuan Li:  https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8

In general, it is better to use the TinkerPop API starting with g.V(). This makes sure you do not skip query optimizations. In addition, it makes your code more portable with respect to other TinkerPop-compatible graph systems.

I am not sure though if this will really help for your use case, but give it a try!

Best wishes,     Marc


Joe Obernberger
 

Hi all - I'm getting the following exception:

org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        at org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:488)
        at org.janusgraph.diskstorage.BackendTransaction.edgeStoreQuery(BackendTransaction.java:271)
        at org.janusgraph.graphdb.database.StandardJanusGraph.edgeQuery(StandardJanusGraph.java:490)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$2.lambda$execute$1(StandardJanusGraphTx.java:1320)
        at org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:107)
        at org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:99)
        at org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:95)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$2.lambda$execute$2(StandardJanusGraphTx.java:1320)
        at org.janusgraph.graphdb.vertices.CacheVertex.loadRelations(CacheVertex.java:73)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$2.execute(StandardJanusGraphTx.java:1320)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$2.execute(StandardJanusGraphTx.java:1231)
        at org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:206)
        at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:69)
        at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:49)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
        at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
        at org.janusgraph.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:45)
        at org.janusgraph.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:68)
        at org.janusgraph.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:49)
        at org.janusgraph.graphdb.vertices.AbstractVertex.edges(AbstractVertex.java:194)
        at com.comp.helios.heliosgraphcorrelationservice.threads.HandleCorrelationIDThread.handleCorrelationID(HandleCorrelationIDThread.java:145)
        at com.comp.helios.heliosgraphcorrelationservice.threads.HandleCorrelationIDThread.run(HandleCorrelationIDThread.java:68)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Permanent exception while executing backend operation EdgeStoreQuery
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:79)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:52)
        ... 26 more
Caused by: java.lang.OutOfMemoryError: Java heap space

The line of interest:

Iterator<Edge> edgeIt = correlationVertex.edges(Direction.IN);

That vertex (correlationVertex) may have a lot of OUT bound edges - maybe in the millions.  Does this try to load those into RAM? The number of INbound edges is small - maybe 5.
Ideas?

Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com