Janusgraph 0.5.3 potential memory leak
sergeymetallic@...
JG 0.5.3(same on 0.5.2), cannot be reproduced on JG 0.3.2
Backend: scyllaDB Indexing backend: ElasticSearch Steps to reproduce: 1) Create a node with a composite index for the field "X" 2) Create another kind (Y) of node and fill with a lot of data (several millions nodes) 3) Create edges between node X and all the nodes Y with the label L 4) Execute the following query in gremlin: g.V().has("X","value").out("Y").valueMap(true) 5) Query should time out after some time The main idea - only scylla/cassandra is involved in the query Expected result: Janusgraph operates normally Observed result: Janusgraph starts consuming all the allocated memory and one of the CPU cores is loaded 100%, another execution will load another core and so on until there are no available. CPU load and CPU consumption happens even if there is no any further interaction with the system. In the end JG becomes unresponsive. Flame chart looks like this |
|
sergeymetallic@...
After some research I figured out that rolling back this PR https://github.com/JanusGraph/janusgraph/pull/2080/files# helps
|
|
Hi there! What you describe looks very interesting and is definitely not intended behavior. You mentioned my PR which seems to cause these troubles. That's quite interesting because this PR was actually merged to *improve* memory handling, not *worsen* it :P Since the PR is rather small and you have probably already had a look at the changes it made: Did you find anything that looks suspicious right away? I would be happy to find this bug and fix it and it would be great if you share everything you already found out. |
|
Thank you for reporting this bug!
That's interesting. The one difference I see is that now the code performs `rs.iterator()` immediately (and not lazily as it was previously). That said, I didn't check if that's the root cause of the problem or not. Probably `rs.iterator()` may cause some issues with memory management in that place (line 328) in the PR but it should be verified. I guess, we need to check if `rs.iterator()` adds any memory pressure during the iterator construction. My point is that `Lazy.of` (which was removed in the PR) memorizes the computation. Thus, repeated calls to `lazyList.get()` will always return the same object. Whereas repeated calls of `rs.iterator()` creates new different iterators. That said, it's just a spontaneous guess and the problem might be with something else. |
|
sergeymetallic@...
I did not figure out the reason of the problem, but what is interesting - CPU does not recover even after an hour or two and the process continues reading from Scylla with pretty high speed. Looks like the reading process was not interrupted properly. But the reason of the issue is not obvious and requires, maybe, deeper profiling.
|
|
One more thing I noticed is that previously we were passing `Iterable` and right now we are passing `Iterator` `ofStaticBuffer` and those methods are computed differently actually.
Here are the first and the second method: private static <E,D> EntryList of(Iterable<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler) private static <E,D> EntryList of(Iterator<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler) If we check the code, their implementation is slightly different. The first methods passes 2 times `elements` and computes something whereas a second method passes `elements` once. I do understand now why we used lazy `Lazy.of` previously. It's just because we were looping `elements` 2 times instead of once. I guess, the main problem in the previous model was that we were adding all elements into an ArrayList inside `Lazy.of` code. I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times. That said, it's just some quick observations. I didn't go deep into the logic |
|
What @porunov mentions, looks quite interesting. When I made the change in the code, I didn't actually notice I changed the signature that is used for `ofStaticBuffer`. But as you mentioned, it now looks like the reason to use `Lazy.of` is gone in the newer version using the `Iterator` and thus, looping twice can not be the issue.
I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`? One thing that I've found is that `StaticArrayEntryList.of(Iterator<E> ... ...)` repeatedly calls a self-implemented method called `ensureSpace` which allocates a new array twice as large as the old one and copies the entries over to the new one. Although the JVM should GC the old (and unused) array, this behavior seems to me like it is prone to cause memory leaks if the unused arrays are not dropped correctly. This method is not used in the `StaticArrayEntryList.of(Iterable<E> ... ...)` implementation. |
|
What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?I mean that possibly we need to change the logic back to use `StaticArrayEntryList.of(Iterable<E> ... ...)` instead of `StaticArrayEntryList.of(Iterator<E> ... ...)`. If so, we may need to use `Lazy.of` again but then we need to think about what exactly it returns (previously it used to return an ArrayList but that's again additional computation which would be better to avoid). We may also think about improving `StaticArrayEntryList.of(Iterator<E> ... ...)` to not cause memory problems but I didn't look deep into the logic yet. The first thing which I'm thinking about, maybe we could change `StaticArrayEntryList.of(Iterator<E> ... ...)` to have the same logic as `StaticArrayEntryList.of(Iterable<E> ... ...)`. Of course, we can't use that iterator 2 times, but we could store intermediate elements inside some Singly Linked List. I guess something like: class SinglyLinkedList<E> { E value; SinglyLinkedList<E> nextElement; } That said, I didn't compare space and time complexity of `StaticArrayEntryList.of(Iterable<E> ... ...)` vs `StaticArrayEntryList.of(Iterator<E> ... ...)`. |
|
owner.mad.epa@...
I run simple benchmark that reproduce oom problem with iterator version
https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade Result Benchmark (size) (valueSize) Mode Cnt Score Error Units StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 5 2738.330 ± 151.820 ops/s StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 5 430.655 ± 34.286 ops/s StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 5 116.830 ± 7.664 ops/s StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 5 206.853 ± 36.894 ops/s StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 5 43.632 ± 1.952 ops/s StaticArrayEntryListBenchmark.iterable 100000 5000 thrpt 5 12.148 ± 0.444 ops/s StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 5 1447.668 ± 484.155 ops/s StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 5 157.839 ± 17.818 ops/s StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 5 31.548 ± 10.991 ops/s StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 5 177.756 ± 4.327 ops/s StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 5 25.456 ± 0.736 ops/s StaticArrayEntryListBenchmark.iterator 100000 5000 java.lang.OutOfMemoryError: Java heap space |
|
Hey @mad, thanks for your benchmark code! I ran a few experiments with it today and figured out that creating an Iterable from the Iterator seems to already solve the problem. I added the following function to the benchmark: @Benchmark
Benchmark (size) (valueSize) Mode Cnt Score Error Units StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 2 3954.258 ops/s StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 2 305.872 ops/s StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 2 85.734 ops/s StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 2 224.861 ops/s StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 2 19.816 ops/s StaticArrayEntryListBenchmark.iterable 100000 5000 thrpt 2 7.058 ops/s StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 2 1619.764 ops/s StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 2 142.065 ops/s StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 2 27.785 ops/s StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 2 181.209 ops/s StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 2 17.115 ops/s
The throughput is almost as high as using Iterable and even the OOM does not occur anymore. If that also fixes the original problem stated at the beginning of this thread, the solution is just a () -> away! |
|
Sadly, that quick fix is not only fast but also incorrect. It requires to iterate the `Iterator` twice which causes incorrect results.
|
|
Opened the issue about this potential bug here: https://github.com/JanusGraph/janusgraph/issues/2524
|
|
Boxuan Li
Can someone share how you run the benchmark (like what JMH version and what janusgraph version you are using) provided by @mad? I ran the benchmark on master (f19df6) but I see OOM errors for both iterator and iterable versions. Furthermore, I don't see any OOM report on the final result (JMH simply omits those runs with exceptions in the final report).
My environment:
My dependencies: My benchmark results:
|
|
Boxuan Li
After understanding more about the context, I feel https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade is not a fair comparison between iterator and iterable versions because it assumes all entries are loaded once in memory, which isn't necessarily true in real-world scenarios where the input is an AsyncResultSet that uses paging.
The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not. @sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well. Best regards, Boxuan |
|
sergeymetallic@...
After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.
|
|
Boxuan Li
FYI: we recently pushed a bug fix https://github.com/JanusGraph/janusgraph/pull/2536 which might be related to the problem you encountered. This will be released in 0.6.0.
toggle quoted message
Show quoted text
|
|