Janusgraph 0.5.3 potential memory leak


Boxuan Li
 

FYI: we recently pushed a bug fix https://github.com/JanusGraph/janusgraph/pull/2536 which might be related to the problem you encountered. This will be released in 0.6.0.

On Mar 28, 2021, at 11:00 PM, sergeymetallic@... wrote:

After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.


sergeymetallic@...
 

After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.


Boxuan Li
 

After understanding more about the context, I feel https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade is not a fair comparison between iterator and iterable versions because it assumes all entries are loaded once in memory, which isn't necessarily true in real-world scenarios where the input is an AsyncResultSet that uses paging.

The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not.

@sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well.

Best regards,
Boxuan


Boxuan Li
 

Can someone share how you run the benchmark (like what JMH version and what janusgraph version you are using) provided by @mad? I ran the benchmark on master (f19df6) but I see OOM errors for both iterator and iterable versions. Furthermore, I don't see any OOM report on the final result (JMH simply omits those runs with exceptions in the final report).

My environment:

# JMH version: 1.29
# VM version: JDK 1.8.0_275, OpenJDK 64-Bit Server VM, 25.275-b01
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -Dvisualvm.id=32547661350356 -Dfile.encoding=UTF-8 -Xmx1G

My dependencies:

<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.29</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.29</version>
<scope>provided</scope>
</dependency>

My benchmark results:


 

Benchmark                               (size)  (valueSize)   Mode  Cnt     Score      Error  Units

StaticArrayEntryListBenchmark.iterable   10000           50  thrpt    5  3653.903 ± 1485.691  ops/s

StaticArrayEntryListBenchmark.iterable   10000         1000  thrpt    5   356.528 ±  100.197  ops/s

StaticArrayEntryListBenchmark.iterable   10000         5000  thrpt    5    90.776 ±   47.783  ops/s

StaticArrayEntryListBenchmark.iterable  100000           50  thrpt    5   202.407 ±   22.577  ops/s

StaticArrayEntryListBenchmark.iterable  100000         1000  thrpt    5    38.114 ±    1.196  ops/s

StaticArrayEntryListBenchmark.iterator   10000           50  thrpt    5  2079.672 ±  312.171  ops/s

StaticArrayEntryListBenchmark.iterator   10000         1000  thrpt    5   170.326 ±   33.554  ops/s

StaticArrayEntryListBenchmark.iterator   10000         5000  thrpt    5    31.522 ±    2.774  ops/s

StaticArrayEntryListBenchmark.iterator  100000           50  thrpt    5   159.831 ±   44.197  ops/s

StaticArrayEntryListBenchmark.iterator  100000         1000  thrpt    5    18.367 ±    4.123  ops/s


Oleksandr Porunov
 
Edited

Opened the issue about this potential bug here: https://github.com/JanusGraph/janusgraph/issues/2524


rngcntr
 
Edited

Sadly, that quick fix is not only fast but also incorrect. It requires to iterate the `Iterator` twice which causes incorrect results.


rngcntr
 
Edited

Hey @mad, thanks for your benchmark code! I ran a few experiments with it today and figured out that creating an Iterable from the Iterator seems to already solve the problem. I added the following function to the benchmark:

@Benchmark
public void iterator_iterable(Blackhole bh) {
EntryList result = StaticArrayEntryList.ofStaticBuffer(() -> entries.iterator(), StaticArrayEntry.ENTRY_GETTER);
bh.consume(result);
}


And the results look very promising:

Benchmark                                        (size)  (valueSize)   Mode  Cnt     Score   Error  Units
StaticArrayEntryListBenchmark.iterable            10000           50  thrpt    2  3954.258          ops/s
StaticArrayEntryListBenchmark.iterable            10000         1000  thrpt    2   305.872          ops/s
StaticArrayEntryListBenchmark.iterable            10000         5000  thrpt    2    85.734          ops/s
StaticArrayEntryListBenchmark.iterable           100000           50  thrpt    2   224.861          ops/s
StaticArrayEntryListBenchmark.iterable           100000         1000  thrpt    2    19.816          ops/s
StaticArrayEntryListBenchmark.iterable           100000         5000  thrpt    2     7.058          ops/s
StaticArrayEntryListBenchmark.iterator            10000           50  thrpt    2  1619.764          ops/s
StaticArrayEntryListBenchmark.iterator            10000         1000  thrpt    2   142.065          ops/s
StaticArrayEntryListBenchmark.iterator            10000         5000  thrpt    2    27.785          ops/s
StaticArrayEntryListBenchmark.iterator           100000           50  thrpt    2   181.209          ops/s
StaticArrayEntryListBenchmark.iterator           100000         1000  thrpt    2    17.115          ops/s
StaticArrayEntryListBenchmark.iterator 100000 5000 java.lang.OutOfMemoryError: Java heap space StaticArrayEntryListBenchmark.iterator_iterable 10000 50 thrpt 2 3557.666 ops/s StaticArrayEntryListBenchmark.iterator_iterable 10000 1000 thrpt 2 331.978 ops/s StaticArrayEntryListBenchmark.iterator_iterable 10000 5000 thrpt 2 87.827 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 50 thrpt 2 241.963 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 1000 thrpt 2 20.257 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 5000 thrpt 2 7.278 ops/s

 

 

The throughput is almost as high as using Iterable and even the OOM does not occur anymore. If that also fixes the original problem stated at the beginning of this thread, the solution is just a  () ->  away!


owner.mad.epa@...
 

I run simple benchmark that reproduce oom problem with iterator version

https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade

Result

Benchmark                               (size)  (valueSize)   Mode  Cnt     Score     Error  Units
StaticArrayEntryListBenchmark.iterable   10000           50  thrpt    5  2738.330 ± 151.820  ops/s
StaticArrayEntryListBenchmark.iterable   10000         1000  thrpt    5   430.655 ±  34.286  ops/s
StaticArrayEntryListBenchmark.iterable   10000         5000  thrpt    5   116.830 ±   7.664  ops/s
StaticArrayEntryListBenchmark.iterable  100000           50  thrpt    5   206.853 ±  36.894  ops/s
StaticArrayEntryListBenchmark.iterable  100000         1000  thrpt    5    43.632 ±   1.952  ops/s
StaticArrayEntryListBenchmark.iterable  100000         5000  thrpt    5    12.148 ±   0.444  ops/s
StaticArrayEntryListBenchmark.iterator   10000           50  thrpt    5  1447.668 ± 484.155  ops/s
StaticArrayEntryListBenchmark.iterator   10000         1000  thrpt    5   157.839 ±  17.818  ops/s
StaticArrayEntryListBenchmark.iterator   10000         5000  thrpt    5    31.548 ±  10.991  ops/s
StaticArrayEntryListBenchmark.iterator  100000           50  thrpt    5   177.756 ±   4.327  ops/s
StaticArrayEntryListBenchmark.iterator  100000         1000  thrpt    5    25.456 ±   0.736  ops/s
StaticArrayEntryListBenchmark.iterator  100000         5000  java.lang.OutOfMemoryError: Java heap space


Oleksandr Porunov
 

What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?
I mean that possibly we need to change the logic back to use `StaticArrayEntryList.of(Iterable<E> ... ...)` instead of `StaticArrayEntryList.of(Iterator<E> ... ...)`. If so, we may need to use `Lazy.of` again but then we need to think about what exactly it returns (previously it used to return an ArrayList but that's again additional computation which would be better to avoid).
We may also think about improving `StaticArrayEntryList.of(Iterator<E> ... ...)` to not cause memory problems but I didn't look deep into the logic yet.
The first thing which I'm thinking about, maybe we could change
`StaticArrayEntryList.of(Iterator<E> ... ...)` to have the same logic as
`StaticArrayEntryList.of(Iterable<E> ... ...)`. Of course, we can't use that iterator 2 times, but we could store intermediate elements inside some Singly Linked List. I guess something like:
class SinglyLinkedList<E> {
  E value;
 
SinglyLinkedList<E> nextElement;
}
That said, I didn't compare space and time complexity of

`StaticArrayEntryList.of(Iterable<E> ... ...)` vs
`StaticArrayEntryList.of(Iterator<E> ... ...)`.


rngcntr
 

What @porunov mentions, looks quite interesting. When I made the change in the code, I didn't actually notice I changed the signature that is used for `ofStaticBuffer`. But as you mentioned, it now looks like the reason to use `Lazy.of` is gone in the newer version using the `Iterator` and thus, looping twice can not be the issue.

I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.
What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?

One thing that I've found is that `StaticArrayEntryList.of(Iterator<E> ... ...)` repeatedly calls a self-implemented method called `ensureSpace` which allocates a new array twice as large as the old one and copies the entries over to the new one. Although the JVM should GC the old (and unused) array, this behavior seems to me like it is prone to cause memory leaks if the unused arrays are not dropped correctly. This method is not used in the `StaticArrayEntryList.of(Iterable<E> ... ...)` implementation.


Oleksandr Porunov
 

One more thing I noticed is that previously we were passing `Iterable` and right now we are passing `Iterator` `ofStaticBuffer` and those methods are computed differently actually.
Here are the first and the second method:
private static <E,D> EntryList of(Iterable<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler)
private static <E,D> EntryList of(Iterator<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler)

If we check the code, their implementation is slightly different. The first methods passes 2 times `elements` and computes something whereas a second method passes `elements` once.

I do understand now why we used lazy `Lazy.of` previously. It's just because we were looping `elements` 2 times instead of once.
I guess, the main problem in the previous model was that we were adding all elements into an ArrayList inside `Lazy.of` code. I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.
That said, it's just some quick observations. I didn't go deep into the logic


sergeymetallic@...
 

I did not figure out the reason of the problem, but what is interesting - CPU does not recover even after an hour or two and the process continues reading from Scylla with pretty high speed. Looks like the reading process was not interrupted properly. But the reason of the issue is not obvious and requires, maybe, deeper profiling.


Oleksandr Porunov
 

Thank you for reporting this bug!

That's interesting. The one difference I see is that now the code performs `rs.iterator()` immediately (and not lazily as it was previously). That said, I didn't check if that's the root cause of the problem or not.
Probably `rs.iterator()` may cause some issues with memory management in that place (line 328) in the PR but it should be verified. I guess, we need to check if `rs.iterator()` adds any memory pressure during the iterator construction.
My point is that `Lazy.of` (which was removed in the PR) memorizes the computation. Thus, repeated calls to `lazyList.get()` will always return the same object. Whereas repeated calls of `rs.iterator()` creates new different iterators.
That said, it's just a spontaneous guess and the problem might be with something else.


rngcntr
 

Hi there!

What you describe looks very interesting and is definitely not intended behavior. You mentioned my PR which seems to cause these troubles. That's quite interesting because this PR was actually merged to *improve* memory handling, not *worsen* it :P

Since the PR is rather small and you have probably already had a look at the changes it made: Did you find anything that looks suspicious right away? I would be happy to find this bug and fix it and it would be great if you share everything you already found out.


sergeymetallic@...
 

After some research I figured out that rolling back this PR https://github.com/JanusGraph/janusgraph/pull/2080/files# helps 


sergeymetallic@...
 

JG 0.5.3(same on 0.5.2), cannot be reproduced on JG 0.3.2
Backend: scyllaDB
Indexing backend: ElasticSearch

Steps to reproduce: 
1) Create a node with a composite index for the field "X"
2) Create another kind (Y) of node and fill with a lot of data (several millions nodes)
3) Create edges between node X and all the nodes Y with the label L
4) Execute the following query in gremlin: g.V().has("X","value").out("Y").valueMap(true)
5) Query should time out after some time

The main idea - only scylla/cassandra is involved in the query

Expected result: Janusgraph operates normally

Observed result: Janusgraph starts consuming all the allocated memory and one of the CPU cores is loaded 100%, another execution will load another core and so on until there are no available. CPU load and CPU consumption happens even if there is no any further interaction with the system. In the end JG becomes unresponsive.

Flame chart looks like this