Date
1 - 4 of 4
Transaction Cache vs. DB Cache Questions
Joseph Kesting
Hello! I am currently working on a project that computes a 2 hop query for several million vertices. In order to speed up these queries I would like to utilize caching but I am having some trouble finding exact documentation on what is stored by the DB Cache vs. what is stored in the Transaction Cache. The query that I am executing traverses all nodes within a two hop network and then extracts a property from all vertices in that network. Currently these queries are running in different threads that share the DB cache but execute separate transactions and am not seeing the cache performance that I would have hoped. Is this property I am trying to fetch cached in the DB cache or is the DB cache is only used to maintain adjacency lists? Additionally, if I did refactor these threads to share a common transaction would that property be cached in the Transaction cache? Thanks for your assistance! Joe |
|
hadoopmarc@...
Hi Joe,
Good question and I do not know the answer. Indeed, the documentation suggests that the DB cache stores less information than the transaction cache, but it is not explicit about vertex properties. It is not explicit about vertex properties in the transaction cache either, but I cannot remember users having problems with missing vertex properties there. TinkerPop/JanusGraph support multi-threaded transactions. When using these (maybe, you already suggested this in your final line), you are sure that vertices are available from the transaction cache, provided its configs match your traversal. Best wishes, Marc |
|
Boxuan Li
Hi Joe, Thus, I believe the “adjacency lists” wording used in https://docs.janusgraph.org/basics/cache/ actually refers to vertices together with vertex properties (and of course, meta-properties), and edges (and of course, edge properties). If you refactor your code and use multiple threads sharing a common transaction, then yes, the properties will be stored in transaction cache. That cache is not based on thread-local objects, so using multi-threading does not harm the cache here. Regarding the performance, you may need to tune your configs, e.g. try increasing cache.db-cache-size, to reduce the chance of frequent cache eviction. Best regards, Boxuan
|
|
Hi Joe,
just as Boxuan already said, the cache size is crucial for this task. But assuming your graph is large, only a fraction of the vertices will fit into the cache even if scaled appropriately. The problem that I see here is that for large graphs, the chance of finding a vertex in the cache is small, if you iterate over your queries in a random order. If you can come up with an execution order where vertices which have a similar 2-hop neighborhood are processed in temporal proximity to each other, that would greatly improve the cache hit rate. Best regards, Florian |
|