Date
1 - 1 of 1
Delays during commit processing
graham...@...
Hi, I posted a question a while back to the users list (here: https://groups.google.com/forum/#!topic/janusgraph-users/pGEZ0yMEP9k ) and subsequently did a bit of digging. In a nutshell, I am seeing approximately 10 second delays every time I commit. Since startup involves around 350 commits it can take up to an hour to start my server. I progressively added performance trace to a small number of classes involved in commit processing, and have found something that I really don't understand. To me it looks like a bug, but I could be wrong. Would appreciate any thoughts from the dev community. The most important classes I added trace to are CacheTransaction, BackendOperation and HBaseStoreManager. Here are the relevant snippets: I used log.error so that I would not need to enable full debug logging. CacheTransaction: private int persist(final Map<String, Map<StaticBuffer, KCVMutation>> subMutations) { BackendOperation public static final<V> V executeDirect(Callable<V> exe, Duration totalWaitTime) throws BackendException { HBaseStoreManager. public void mutateMany(Map<String, Map<StaticBuffer, KCVMutation>> mutations, StoreTransaction txh) throws BackendException { With the above logging in place, I narrowed down the cause of the time delays to the following. Note in particular the thread switch (to pool5-thread-1) and the 4 attempts to execute the operation. I was kind of expecting that they might be due to retries caused by waiting for HBase - but we don't get to the HBaseManager until the last line of this snippet. 2018-07-04 10:16:45,149 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === CacheTransaction.persist callable mutateMany (CacheTransaction$1:104) 2018-07-04 10:16:49,489 ERROR - [pool-5-thread-1:] ~ PERF: === BackendOperation.executeDirect totalWaitTime=4000 ms, maxTime=1530695813489 (BackendOperation:65) 2018-07-04 10:16:49,489 ERROR - [pool-5-thread-1:] ~ PERF: === BackendOperation.executeDirect pre while (BackendOperation:68) 2018-07-04 10:16:49,489 ERROR - [pool-5-thread-1:] ~ PERF: === BackendOperation.executeDirect exe call (BackendOperation:71) 2018-07-04 10:16:54,494 ERROR - [pool-5-thread-1:] ~ PERF: === BackendOperation.executeDirect totalWaitTime=4000 ms, maxTime=1530695818494 (BackendOperation:65) 2018-07-04 10:16:54,494 ERROR - [pool-5-thread-1:] ~ PERF: === BackendOperation.executeDirect pre while (BackendOperation:68) 2018-07-04 10:16:54,494 ERROR - [pool-5-thread-1:] ~ PERF: === BackendOperation.executeDirect exe call (BackendOperation:71) 2018-07-04 10:16:55,130 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === BackendOperation.executeDirect totalWaitTime=10000 ms, maxTime=1530695825130 (BackendOperation:65) 2018-07-04 10:16:55,132 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === BackendOperation.executeDirect pre while (BackendOperation:68) 2018-07-04 10:16:55,133 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === BackendOperation.executeDirect exe call (BackendOperation:71) 2018-07-04 10:16:55,136 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === BackendOperation.executeDirect totalWaitTime=10000 ms, maxTime=1530695825136 (BackendOperation:65) 2018-07-04 10:16:55,137 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === BackendOperation.executeDirect pre while (BackendOperation:68) 2018-07-04 10:16:55,138 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ PERF: === BackendOperation.executeDirect exe call (BackendOperation:71) 2018-07-04 10:16:55,141 ERROR - [pool-1-thread-10 - cd79b8f7-4f8f-4a2f-a1ba-e0fded20b7bc:] ~ HBaseStoreManager: mutateMany, begin (HBaseStoreManager:414) So I am puzzled as to why there appear to be 4 attempts to invoke the mutateMany operation, with wait times of 4 seconds, 4 seconds, 10 seconds and 10 seconds. Why any retries? Why the 4, 4, 10, 10 escalation? (I have examined the wait time perturbation and still don't get this). But most importantly I am puzzling over why CacheTransaction creates a Callable and asks BackendOperation to execute it directly instead of using the ExecutorService and Futures. I can't see exactly what is causing the thread switch - but it appears to be redundant and costs a chunk of time each time we switch threads. I realize that this is performance critical code so there may be a great reason that I am not aware of yet. If not, then I am happy to experiment with converting this to using the ExecutorService. Best regards Graham |
|