CQL scaling limit?
madams@...
Hi all, We've been trying to scale horizontally Janusgraph to keep up with the data throughput, but we're reaching some limit in scaling.
Each indexer counts the number of records it processes. We start with one indexer, and every 5 ~ 10 minutes we double the number of indexers and look how the overall performance increased. From top down, left to right these panels represent:
Our janusgraph configuration looks like: storage.backend=cql
Each transaction inserts ~ 10 records. Each indexer runs 10 transactions in parallel. We tried different things but without success:
The most successful test so far was to switch the cql write consistency level to ANY and the read consistency level to ONE:
We don't use LOCK consistency. We never update vertices or edges, so FORK consistency doesn't look useful in our case. It really looks like something somewhere is a bottleneck when we start scaling. I checked out the Janusgraph github repo locally and went through it to try to understand what set of operations Janusgraph does to insert vertices/edges and how this binds to transactions, but I'm struggling a little to find that information. So, any idea/recommendations? Cheers, |
|
Hi,
I didn't check your metrics but my first impression was this might be related to the internal thread pool. Can you try out the 0.6.0 pre-release version or master version? Remember to set `storage.cql.executor-service.enabled` to false. Before 0.6.0, an internal thread pool was used to process all CQL queries, which had a hard-coded 15 threads. https://github.com/JanusGraph/janusgraph/pull/2700 made the thread pool configurable, and https://github.com/JanusGraph/janusgraph/pull/2741 further made this thread pool optional. EDIT: Just realized your problem was related to the horizontal scaling of JanusGraph instances. Then this internal thread pool thing is likely not related - but still worth trying. Hope this helps. Best, Boxuan |
|
madams@...
Hi Boxuan, I can definitively try from the 0.6.0 pre release or master version, that's a good idea, Thanks! |
|
hadoopmarc@...
Hi Marc,
Just to be sure: the indexers itself are not limited in the number of CPU they can grab? The 60 indexers run on the same machine? Or in independent cloud containers? If the indexers are not CPU limited, it would be interesting to log where the time is spent: their own java code, waiting for transactions to complete, waiting for the id manager to return id-blocks? Best wishes, Marc |
|
madams@...
Hi Marc, We're running on Kubernetes, and there's no cpu limitations on the indexer. From left to right, top to bottom:
The grey areas represents the moment when the overall performance stopped scaling linearly with the number of indexers. We're not maxing out the cpus yet, so it looks like we can still push the cluster. I don't have the IO waiting time per indexer unfortunately, but the node-exporter metric on IO waiting time fits with the grey areas in the graphs. As you mentioned the ID Block allocation, I checked the logs for warning messages, and they are actually id allocation warning messages, I looked for other warning messages but didn't find any. I tried increasing the Id Block size to 10 000 000 but didn't see any improvement - that said, from my understanding of the ID allocation it is the perfect suspect. I'll rerun these tests on a completely fresh graph with ids.block-size=10000000 to double check. If that does not work, I'll try upgrading to the master version and re run the test. Any tip on how to log which part is slowing the insertion? I was thinking maybe of using the org.janusgraph.util.stats.MetricManager to time the execution time of parts of the code of the org.janusgraph.graphdb.database.StandardJanusGraph.commit() method. Thanks a lot, |
|
hadoopmarc@...
Hi Marc,
If you know how to handle MetricManager, that sounds fine. I was thinking in more basic terms: adding some log statements to you indexer java code. Regarding the id block allocation, some features seem to have been added, which are still largely undocumented, see: https://github.com/JanusGraph/janusgraph/blob/83c93fe717453ec31086ca1a208217a747ebd1a8/janusgraph-core/src/main/java/org/janusgraph/diskstorage/idmanagement/ConflictAvoidanceMode.java https://docs.janusgraph.org/basics/janusgraph-cfg/#idsauthority Notice that the default value for ids.authority.conflict-avoidance-mode is NONE. Given the rigor you show in your attempts, trying other values seems worth a try too! Best wishes, Marc |
|
hadoopmarc@...
Just one more thing to rule out: did you set cpu.request and cpu.limit of the indexer containers to the same value? You want the pods to be really independent for this test.
|
|
madams@...
Hi Marc, I tried rerunning the scaling test on a fresh graph with ids.block-size=10000000 , unfortunately I haven't seen any performance gain. I also tried ids.block-size=10000000 and ids.authority.conflict-avoidance-mode=GLOBAL_AUTO, but there also there was no performance gain.
I tried something else which turned out to be very successful: instead of inserting all the properties in the graph, I tried only inserting the ones necessary to feed the composite indexes and vertex-centric indexes. The indexes are used to execute efficiently the "get element or create it" logic. This test scaled quite nicely up to 64 indexers (instead of 4 before)!
My best guess for why it is the case: they reduced the amount of work the ScyllaDB coordinators had to do by:
I would happily continue digging into this, unfortunately we have other priorities that turned up. We're putting the testing on the side for the moment. I thought I would post my complete findings/guess anyway in case they are useful to someone.
Thank you so much for your help! |
|
hadoopmarc@...
Nice work!
|
|