Re: CQL scaling limit?


madams@...
 

Hi Marc,

I tried rerunning the scaling test on a fresh graph with ids.block-size=10000000 , unfortunately I haven't seen any performance gain.

I also tried ids.block-size=10000000 and ids.authority.conflict-avoidance-mode=GLOBAL_AUTO, but there also there was no performance gain.
I used GLOBAL_AUTO as it was the easiest to test, I ran the test twice to make sure the result was not just due to unlucky random tag assignment. I didn't do the math, but I guess I would have to be very unlucky to get twice a very bad random tag allocation!

 

I tried something else which turned out to be very successful:

instead of inserting all the properties in the graph, I tried only inserting the ones necessary to feed the composite indexes and vertex-centric indexes. The indexes are used to execute efficiently the "get element or create it" logic. This test scaled quite nicely up to 64 indexers (instead of 4 before)!




Out of all the tests I tried so far, the two most successful ones were:

  1. decreasing the cql consistency level (from Quorum to ANY/ONE)
  2. decreasing the number of properties


What's interesting with these two cases, is that they didn't significantly increased the performance of a single indexer, they really increased the horizontal scalability we could achieve.

My best guess for why it is the case: they reduced the amount of work the ScyllaDB coordinators had to do by:

  1. decreasing the amount of coordination necessary to get a majority answer (Quorum)
  2. decreasing the size in bytes of the cql unlogged batches, some of our properties can be quite big ( > 1KB )

I would happily continue digging into this, unfortunately we have other priorities that turned up. We're putting the testing on the side for the moment.

I thought I would post my complete findings/guess anyway in case they are useful to someone.

 

Thank you so much for your help!
Cheers,
Marc

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.