Janusgraph 0.5.3 potential memory leak
On Mar 28, 2021, at 11:00 PM, sergeymetallic@... wrote:After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.
The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not.
@sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well.
Best regards,
Boxuan
My environment:
# JMH version: 1.29# VM version: JDK 1.8.0_275, OpenJDK 64-Bit Server VM, 25.275-b01# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java# VM options: -Dvisualvm.id=32547661350356 -Dfile.encoding=UTF-8 -Xmx1G
My dependencies:
My benchmark results:<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.29</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.29</version>
<scope>provided</scope>
</dependency>
Benchmark (size) (valueSize) Mode Cnt Score Error Units
StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 5 3653.903 ± 1485.691 ops/s
StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 5 356.528 ± 100.197 ops/s
StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 5 90.776 ± 47.783 ops/s
StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 5 202.407 ± 22.577 ops/s
StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 5 38.114 ± 1.196 ops/s
StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 5 2079.672 ± 312.171 ops/s
StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 5 170.326 ± 33.554 ops/s
StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 5 31.522 ± 2.774 ops/s
StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 5 159.831 ± 44.197 ops/s
StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 5 18.367 ± 4.123 ops/s
Hey @mad, thanks for your benchmark code! I ran a few experiments with it today and figured out that creating an Iterable from the Iterator seems to already solve the problem. I added the following function to the benchmark:
@Benchmark
public void iterator_iterable(Blackhole bh) {
EntryList result = StaticArrayEntryList.ofStaticBuffer(() -> entries.iterator(), StaticArrayEntry.ENTRY_GETTER);
bh.consume(result);
}
And the results look very promising:
Benchmark (size) (valueSize) Mode Cnt Score Error Units StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 2 3954.258 ops/s StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 2 305.872 ops/s StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 2 85.734 ops/s StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 2 224.861 ops/s StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 2 19.816 ops/s StaticArrayEntryListBenchmark.iterable 100000 5000 thrpt 2 7.058 ops/s StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 2 1619.764 ops/s StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 2 142.065 ops/s StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 2 27.785 ops/s StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 2 181.209 ops/s StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 2 17.115 ops/s
StaticArrayEntryListBenchmark.iterator 100000 5000 java.lang.OutOfMemoryError: Java heap space StaticArrayEntryListBenchmark.iterator_iterable 10000 50 thrpt 2 3557.666 ops/s StaticArrayEntryListBenchmark.iterator_iterable 10000 1000 thrpt 2 331.978 ops/s StaticArrayEntryListBenchmark.iterator_iterable 10000 5000 thrpt 2 87.827 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 50 thrpt 2 241.963 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 1000 thrpt 2 20.257 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 5000 thrpt 2 7.278 ops/s
The throughput is almost as high as using Iterable and even the OOM does not occur anymore. If that also fixes the original problem stated at the beginning of this thread, the solution is just a () -> away!
https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade
Result
Benchmark (size) (valueSize) Mode Cnt Score Error Units
StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 5 2738.330 ± 151.820 ops/s
StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 5 430.655 ± 34.286 ops/s
StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 5 116.830 ± 7.664 ops/s
StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 5 206.853 ± 36.894 ops/s
StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 5 43.632 ± 1.952 ops/s
StaticArrayEntryListBenchmark.iterable 100000 5000 thrpt 5 12.148 ± 0.444 ops/s
StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 5 1447.668 ± 484.155 ops/s
StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 5 157.839 ± 17.818 ops/s
StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 5 31.548 ± 10.991 ops/s
StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 5 177.756 ± 4.327 ops/s
StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 5 25.456 ± 0.736 ops/s
StaticArrayEntryListBenchmark.iterator 100000 5000 java.lang.OutOfMemoryError: Java heap space
What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?I mean that possibly we need to change the logic back to use `StaticArrayEntryList.of(Iterable<E> ... ...)` instead of `StaticArrayEntryList.of(Iterator<E> ... ...)`. If so, we may need to use `Lazy.of` again but then we need to think about what exactly it returns (previously it used to return an ArrayList but that's again additional computation which would be better to avoid).
We may also think about improving `StaticArrayEntryList.of(Iterator<E> ... ...)` to not cause memory problems but I didn't look deep into the logic yet.
The first thing which I'm thinking about, maybe we could change `StaticArrayEntryList.of(Iterator<E> ... ...)` to have the same logic as
`StaticArrayEntryList.of(Iterable<E> ... ...)`. Of course, we can't use that iterator 2 times, but we could store intermediate elements inside some Singly Linked List. I guess something like:
class SinglyLinkedList<E> {
E value;
SinglyLinkedList<E> nextElement;
}
That said, I didn't compare space and time complexity of
`StaticArrayEntryList.of(Iterable<E> ... ...)` vs
`StaticArrayEntryList.of(Iterator<E> ... ...)`.
I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?
One thing that I've found is that `StaticArrayEntryList.of(Iterator<E> ... ...)` repeatedly calls a self-implemented method called `ensureSpace` which allocates a new array twice as large as the old one and copies the entries over to the new one. Although the JVM should GC the old (and unused) array, this behavior seems to me like it is prone to cause memory leaks if the unused arrays are not dropped correctly. This method is not used in the `StaticArrayEntryList.of(Iterable<E> ... ...)` implementation.
Here are the first and the second method:
private static <E,D> EntryList of(Iterable<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler)
private static <E,D> EntryList of(Iterator<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler)
If we check the code, their implementation is slightly different. The first methods passes 2 times `elements` and computes something whereas a second method passes `elements` once.
I do understand now why we used lazy `Lazy.of` previously. It's just because we were looping `elements` 2 times instead of once.
I guess, the main problem in the previous model was that we were adding all elements into an ArrayList inside `Lazy.of` code. I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.
That said, it's just some quick observations. I didn't go deep into the logic
That's interesting. The one difference I see is that now the code performs `rs.iterator()` immediately (and not lazily as it was previously). That said, I didn't check if that's the root cause of the problem or not.
Probably `rs.iterator()` may cause some issues with memory management in that place (line 328) in the PR but it should be verified. I guess, we need to check if `rs.iterator()` adds any memory pressure during the iterator construction.
My point is that `Lazy.of` (which was removed in the PR) memorizes the computation. Thus, repeated calls to `lazyList.get()` will always return the same object. Whereas repeated calls of `rs.iterator()` creates new different iterators.
That said, it's just a spontaneous guess and the problem might be with something else.
Hi there!
What you describe looks very interesting and is definitely not intended behavior. You mentioned my PR which seems to cause these troubles. That's quite interesting because this PR was actually merged to *improve* memory handling, not *worsen* it :P
Since the PR is rather small and you have probably already had a look at the changes it made: Did you find anything that looks suspicious right away? I would be happy to find this bug and fix it and it would be great if you share everything you already found out.
Backend: scyllaDB
Indexing backend: ElasticSearch
Steps to reproduce:
1) Create a node with a composite index for the field "X"
2) Create another kind (Y) of node and fill with a lot of data (several millions nodes)
3) Create edges between node X and all the nodes Y with the label L
4) Execute the following query in gremlin: g.V().has("X","value").out("Y").valueMap(true)
5) Query should time out after some time
The main idea - only scylla/cassandra is involved in the query
Expected result: Janusgraph operates normally
Observed result: Janusgraph starts consuming all the allocated memory and one of the CPU cores is loaded 100%, another execution will load another core and so on until there are no available. CPU load and CPU consumption happens even if there is no any further interaction with the system. In the end JG becomes unresponsive.
Flame chart looks like this