Re: [Performance Optimization] Optimization around the `system_properties` table interaction
Hi all
Updates on this issue - We found that the periodic removal of system_properties (while the ingestion is running) leads to graph corruption (mentioned at high level at https://docs.janusgraph.org/advanced-topics/recovery/) - The perf issue we saw were due to below reasons - improper handling on dataproc scaledown which lead to connections not getting closed to JG, and thus ever increasing system_properties table - unbounded access to the scylla caching layer, which is basically unthrottled access to scylla caching system, leading to other queries slowing down due to the system_properties single, hot partition - in addition to this, the data model for system_properties still needs to be fixed via usage of clustering keys, by design system_properties has only 1 SINGLE partition and all spark executors hit it while initialization leading to query slow down -> query queuing -> query timeouts Thanks
|
|