[Performance Optimization] Optimization around the `system_properties` table interaction


sauverma
 
Edited

Hi all

- The interaction with the underlying KV store via janusgraph client hits the `system_properties` table with a range query where the key is `configuration` (key = 0x636f6e66696775726174696f6e)

- The observation is that the janusgraph client stores all the configurations (static + dynamic) is stored against `configuration` key

- When we run the job with spark executors, where each executor is using janusgraph embedded mode, each of these executors create executor level entries (dynamic) with the same key `configuration`

- Thus as the number of executors increase, the particular partition with the key `configuration` starts becoming a large partition, and queries with key=`configuration` become range queries scanning the large partition as seen in below graphs (these are from scylla monitoring grafana dashboard)

- I would like to know if this there is a fix in progress for this issue - we at zeotap are using JanusGraph at tremendous scale (single graphs having 70 billion Vertices and 50 billion edges) and have identified couple of solutions to fix this



Thanks
Saurabh Verma
+91 7976984604


rngcntr
 

Hi Saurabh!

Thanks for reporting that issue! Looking at the open pull requests, I don't see one that addresses this problem. You're always welcome to share your solutions by discussing them here or even submitting PRs directly. Do you already have these fixes in place and use them in your productive environment or is it rather an early stage draft?


simone3.cattani@...
 

On Mon, Feb 15, 2021 at 05:59 AM, sauverma wrote:
zeotap
Hi Saurabh,
we are experiencing the exact same issue: spark job with one janusgraph instance per partition calling a scylladb cluster. Our symptom is a ton of timeout exceptions due to missing QUORUM in read queries.

If you want to share your proposed solutions, we will be happy to try them.


sauverma
 
Edited

Thank you folks for getting back.

@Simone3, yes this issue comes out as read timeout from the shard holding the system_properties table (there is only 1 partition for system_properties unreplicated).

We've used below workarounds to bypass it for now (the code change required in janusgraph is under test right now) based on the observations

- set the gc_grace_seconds for system_properties to 0
- truncate system_properties table periodically (say 2 hours)

Thanks


simone3.cattani@...
 

HI @sauverma,
nice, we are truncating the table, too. It's good to have your confirmation that it could safely workaround the issue.
We will analyze the `gc_grace_seconds` setting now.

Just for curiosity: are you fixing it changing the `KCVSConfiguration` in order to store properties as rows instead of columns? 


sauverma
 

On Mon, Feb 22, 2021 at 01:58 AM, <simone3.cattani@...> wrote:
columns
Hi @simone

We were planning to segregate the static configurations from the runtime dynamic configurations (last update TS from client, etc.).
AFAIK only the static configurations are required by the janusgraph clients while initializing.

Thanks
Saurabh


sauverma
 

Hi all

Updates on this issue

- We found that the periodic removal of system_properties (while the ingestion is running) leads to graph corruption (mentioned at high level at https://docs.janusgraph.org/advanced-topics/recovery/)
- The perf issue we saw were due to below reasons
     - improper handling on dataproc scaledown which lead to connections not getting closed to JG, and thus ever increasing system_properties table
     - unbounded access to the scylla caching layer, which is basically unthrottled access to scylla caching system, leading to other queries slowing down due to the system_properties single, hot partition
     - in addition to this, the data model for system_properties still needs to be fixed via usage of clustering keys, by design system_properties has only 1 SINGLE partition and all spark executors hit it while initialization leading to  query slow down -> query queuing -> query timeouts

Thanks


Boxuan Li
 

Hi @sauverma,

I am just curious: I noticed you said "there is only 1 partition for system_properties unreplicated". Do you have storage.cql.replication-factor = 1?


sauverma
 

Hi Boxuan Li

We are using RF3.

What I meant is that essentially the data for system_properties is going to a single partition. Due to RF3, this partition is replicated 3 times.

Does this clarify?

Thanks
Saurabh


Boxuan Li
 

Yeah that makes sense. I saw you said “unreplicated” thus wondered. I am not familiar with how `system_properties` is handled, but just want to point out that it is very difficult if not impossible to change the data model while keeping backward compatibility at the same time.

On Apr 22, 2021, at 11:22 PM, sauverma <saurabhdec1988@...> wrote:

Hi Boxuan Li

We are using RF3.

What I meant is that essentially the data for system_properties is going to a single partition. Due to RF3, this partition is replicated 3 times.

Does this clarify?

Thanks
Saurabh