[Performance Optimization] Optimization around the `system_properties` table interaction


Hi all

- The interaction with the underlying KV store via janusgraph client hits the `system_properties` table with a range query where the key is `configuration` (key = 0x636f6e66696775726174696f6e)

- The observation is that the janusgraph client stores all the configurations (static + dynamic) is stored against `configuration` key

- When we run the job with spark executors, where each executor is using janusgraph embedded mode, each of these executors create executor level entries (dynamic) with the same key `configuration`

- Thus as the number of executors increase, the particular partition with the key `configuration` starts becoming a large partition, and queries with key=`configuration` become range queries scanning the large partition as seen in below graphs (these are from scylla monitoring grafana dashboard)

- I would like to know if this there is a fix in progress for this issue - we at zeotap are using JanusGraph at tremendous scale (single graphs having 70 billion Vertices and 50 billion edges) and have identified couple of solutions to fix this

Saurabh Verma
+91 7976984604


Hi Saurabh!

Thanks for reporting that issue! Looking at the open pull requests, I don't see one that addresses this problem. You're always welcome to share your solutions by discussing them here or even submitting PRs directly. Do you already have these fixes in place and use them in your productive environment or is it rather an early stage draft?


On Mon, Feb 15, 2021 at 05:59 AM, sauverma wrote:
Hi Saurabh,
we are experiencing the exact same issue: spark job with one janusgraph instance per partition calling a scylladb cluster. Our symptom is a ton of timeout exceptions due to missing QUORUM in read queries.

If you want to share your proposed solutions, we will be happy to try them.


Thank you folks for getting back.

@Simone3, yes this issue comes out as read timeout from the shard holding the system_properties table (there is only 1 partition for system_properties unreplicated).

We've used below workarounds to bypass it for now (the code change required in janusgraph is under test right now) based on the observations

- set the gc_grace_seconds for system_properties to 0
- truncate system_properties table periodically (say 2 hours)



HI @sauverma,
nice, we are truncating the table, too. It's good to have your confirmation that it could safely workaround the issue.
We will analyze the `gc_grace_seconds` setting now.

Just for curiosity: are you fixing it changing the `KCVSConfiguration` in order to store properties as rows instead of columns? 


On Mon, Feb 22, 2021 at 01:58 AM, <simone3.cattani@...> wrote:
Hi @simone

We were planning to segregate the static configurations from the runtime dynamic configurations (last update TS from client, etc.).
AFAIK only the static configurations are required by the janusgraph clients while initializing.