Re: CQL scaling limit?


Hi Marc,

We're running on Kubernetes, and there's no cpu limitations on the indexer.
Thanks for pointing this out, I actually haven't checked the overall resources of the cluster... It's a good sanity check:

From left to right, top to bottom:

  • The total cpu usage per node on the cluster, there's about 20 of them and other non janusgraph applications are running, that's why it's not 0 when the tests are not running
  • The processed records per Indexer, there's two tests here in one panel:
    1. First group is the scaling test with the cql QUORUM consistency
    2. Second group is the scaling test with write consistency ANY and the read consistency ONE
  • The IO Waiting time per node in the cluster (unfortunately I don't have this metric per indexer)
  • ID block allocation warning logs events (the exact log message looks like "Temporary storage exception while acquiring id block - retrying in PT2.4S: {}")

The grey areas represents the moment when the overall performance stopped scaling linearly with the number of indexers.

We're not maxing out the cpus yet, so it looks like we can still push the cluster. I don't have the IO waiting time per indexer unfortunately, but the node-exporter metric on IO waiting time fits with the grey areas in the graphs.

As you mentioned the ID Block allocation, I checked the logs for warning messages, and they are actually id allocation warning messages, I looked for other warning messages but didn't find any.

I tried increasing the Id Block size to 10 000 000 but didn't see any improvement - that said, from my understanding of the ID allocation it is the perfect suspect. I'll rerun these tests on a completely fresh graph with ids.block-size=10000000 to double check.

If that does not work, I'll try upgrading to the master version and re run the test. Any tip on how to log which part is slowing the insertion? I was thinking maybe of using the org.janusgraph.util.stats.MetricManager to time the execution time of parts of the code of the org.janusgraph.graphdb.database.StandardJanusGraph.commit() method.

Thanks a lot,

Join { to automatically receive all group messages.