JanusGraph database cache on distributed setup
In a multi node Janusgraph cluster, data modification done from one instance does not sync with others until it reaches the given expiry time (cache.db-cache-time
)
As per the documentation[1] it does not recommends to enable database level cache in a distributed setup as cached data does not share amoung instances.
Any suggestions for a solution/workaround where i can see the data changes from other JG instances immediately and avoid stale data access ?
[1] https://docs.janusgraph.org/operations/cache/#cache-expiration-time
https://www.scylladb.com/2017/09/18/scylla-2-0-workload-conditioning/
You only run into trouble with queries that ask for tens or hundreds of vertices, but you can ask if it is reasonable to be realtime at the ms level for such large queries.
Best wishes, Marc
Once we enable the db.cache, what ever the modification done for particular vertex only visible for that JG instance untill the cache expires. So if we have multiple JG instances, the modifications done from one instance does not reflect on other immediately.
If we can have centralized cache which syncs up with all JG instances this can be avoided.
Thanks, Wasantha
On Feb 10, 2022, at 10:59 PM, washerath@... wrote:Actually the concern is with db.cache feature.
Once we enable the db.cache, what ever the modification done for particular vertex only visible for that JG instance untill the cache expires. So if we have multiple JG instances, the modifications done from one instance does not reflect on other immediately.
If we can have centralized cache which syncs up with all JG instances this can be avoided.
Thanks, Wasantha
Hi Boxuan,
I am evaluating the approach of rewriting ExpirationKCVSCache as suggested. There i could replace existing guava cache implementation to connect with remote Redis db. So that Redis DB will be act as centralized cache which can connects with all other JG instances.
While going through the JG source it could find same guava cache implementation (cachebuilder = CacheBuilder.newBuilder()) uses on several other places. Eg . GuavaSubqueryCache, GuavaVertexCache, ...
Will it be sufficient to have modification only on ExpirationKCVSCache or do we need to look for modification on several other places as well ?
Thanks
Wasantha
Hi Wasantha,
It's great to see that you have made some progress. If possible, it would be awesome if you could contribute your implementation to the community!
Yes, modifying `ExpirationKCVSCache` is enough. `GuavaSubqueryCache` and `GuavaVertexCache` and transaction-wise cache, so you don't want to make them global. `RelationQueryCache` and `SchemaCache` are graph-wise cache, you could make them global, but not necessary since they only store schema rather than real data - actually I would recommend not doing so because JanusGraph already has a mechanism of evicting stale schema cache.
Best,
Boxuan
Hi Boxuan,
I was able to change ExpirationKCVSCache class to persist the cache on Redis DB,
But i could still see some data anomaly between two JG instances. As example when i change a property of a vertex from one JG server [ g.V(40964200).property('data', 'some_other_value') ]
JG instance A
it does not reflect on other JG instance.
JG instance B
When debuging the flow we could identify that when we triggering a vertex property modification, it gets persists on guava cache using GuavaVertexCache add method and when retrieving it reads data using get method on same class. This could be the reason for above observation.
Feels like we might need to do modifications on transaction-wise cache as well. Correct me if i am missing something here and happy to contribute the implementation to the community once this done.
Thanks
Wasantha
On Feb 19, 2022, at 11:51 AM, washerath@... wrote:Hi Boxuan,
I was not using a session on gremlin console. So i guess it does not need to commit explicitly. Anyway i have tried commiting the transaction [ g.tx().commit() ] after opening a session, but same behaviour observered.
ThanksWasantha
Are you sure that committing a modification is sufficient to move over the change from the transaction cache to the database cache, botth in the current and in your new ReDis implementation? Maybe you can test by having a remote modification test followed by a retrieval request of the same vertex from the same client, so that the database cache is filled explicitly (before the second client attempts to retrieve it).
Marc
On Feb 20, 2022, at 6:23 AM, hadoopmarc@... wrote:If you do not use sessions, remote requests to Gremlin Server are committed automatically, see: https://tinkerpop.apache.org/docs/current/reference/#considering-transactions .
Are you sure that committing a modification is sufficient to move over the change from the transaction cache to the database cache, botth in the current and in your new ReDis implementation? Maybe you can test by having a remote modification test followed by a retrieval request of the same vertex from the same client, so that the database cache is filled explicitly (before the second client attempts to retrieve it).
Marc
Hi Boxuan,
We were able to overcome sync issue on multiple JG instance after doing modifications on void invalidate(StaticBuffer key, List<CachableStaticBuffer> entries) as suggested. We had few performance issues to resolve which took considerable effort.
i am happy to contribute the implementation to community since it's almost done. Please guide on that.
One another question on cache :
private final Cache<KeySliceQuery,EntryList> cache;
As per above initialization on cache object it persists only the specific query against result of that query. It does not caches all the traversal vertices. Is my understanding correct ?
Thanks
Wasantha
It's great to hear that you have solved the previous problem.
Regarding contributing to the community, I would suggest you create a GitHub issue first, describing the problem and your approach there. This is not required but recommended. Then, you could create a pull request linking to that issue (note that you would also be asked to sign an individual CLA or corporate CLA once you create your first pull request in JanusGraph).
I am not 100% sure if I understand your question, but I guess you are asking what exactly is stored in that cache. Basically, that cache stores the raw data fetched from the storage backend. It does not have to be vertex - it could be vertex properties or edges. It might also contain deserialized data (See getCache() method in Entry.java). Note that in your case, since your cache is not local, it might be a better idea to store only the raw data but not the deserialized data to reduce the network overhead. To achieve that, you could override Entry::setCache method and let it do nothing. If you are interested in learning more about the "raw data", I wrote a blog Data layout in JanusGraph that you might be interested in.
Hope this helps.
Boxuan