Re: [Performance Optimization] Optimization around the `system_properties` table interaction


sauverma
 

Hi all

Updates on this issue

- We found that the periodic removal of system_properties (while the ingestion is running) leads to graph corruption (mentioned at high level at https://docs.janusgraph.org/advanced-topics/recovery/)
- The perf issue we saw were due to below reasons
     - improper handling on dataproc scaledown which lead to connections not getting closed to JG, and thus ever increasing system_properties table
     - unbounded access to the scylla caching layer, which is basically unthrottled access to scylla caching system, leading to other queries slowing down due to the system_properties single, hot partition
     - in addition to this, the data model for system_properties still needs to be fixed via usage of clustering keys, by design system_properties has only 1 SINGLE partition and all spark executors hit it while initialization leading to  query slow down -> query queuing -> query timeouts

Thanks

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.