JanusGraph database cache on distributed setup
washerath@...
In a multi node Janusgraph cluster, data modification done from one instance does not sync with others until it reaches the given expiry time ( As per the documentation[1] it does not recommends to enable database level cache in a distributed setup as cached data does not share amoung instances. Any suggestions for a solution/workaround where i can see the data changes from other JG instances immediately and avoid stale data access ? [1] https://docs.janusgraph.org/operations/cache/#cache-expiration-time |
|
hadoopmarc@...
How fast is immediately? A well dimensioned cassandra or scylladb cluster (with its own block cache!) should be able to serve requests at the ms level.
https://www.scylladb.com/2017/09/18/scylla-2-0-workload-conditioning/ You only run into trouble with queries that ask for tens or hundreds of vertices, but you can ask if it is reasonable to be realtime at the ms level for such large queries. Best wishes, Marc |
|
washerath@...
Actually the concern is with db.cache feature.
Once we enable the db.cache, what ever the modification done for particular vertex only visible for that JG instance untill the cache expires. So if we have multiple JG instances, the modifications done from one instance does not reflect on other immediately. If we can have centralized cache which syncs up with all JG instances this can be avoided. Thanks, Wasantha |
|
Boxuan Li
Hi Wasantha,
toggle quoted message
Show quoted text
A centralized cache is a good idea in many use cases. What you could do is to maintain a centralized cache by yourself. This, however, requires some changes to your application code (e.g. your app might need to do a look up in cache and then query JanusGraph). A more advanced approach is to rewrite ExpirationKCVSCache (https://javadoc.io/doc/org.janusgraph/janusgraph-core/latest/org/janusgraph/diskstorage/keycolumnvalue/cache/ExpirationKCVSCache.html) by yourself and let it store cache in a centralized cache rather than the local cache. Then, the db.cache feature should still work except that the cache is synced across JanusGraph instances. Best, Boxuan
|
|
washerath@...
Hi Boxuan, Wasantha |
|
Boxuan Li
Hi Wasantha,
|
|
washerath@...
Hi Boxuan, Wasantha |
|
Boxuan Li
Hi Wasantha,
In your example, it looks like you didn't commit your transaction on JG instance A. Uncommitted changes are only visible to the local transaction on the local instance. Can you try committing it first on A and then query on B? Best, Boxuan |
|
washerath@...
Hi Boxuan, Wasantha |
|
Boxuan Li
Hi Wasantha,
toggle quoted message
Show quoted text
I am not familiar with the transaction scope when using a remote Gremlin server, so I could be wrong, but could you try rolling back the transaction explicitly on JG instance B? Just to make sure you are not accessing the stale data cached in a local transaction. Best, Boxuan
|
|
hadoopmarc@...
If you do not use sessions, remote requests to Gremlin Server are committed automatically, see: https://tinkerpop.apache.org/docs/current/reference/#considering-transactions .
Are you sure that committing a modification is sufficient to move over the change from the transaction cache to the database cache, botth in the current and in your new ReDis implementation? Maybe you can test by having a remote modification test followed by a retrieval request of the same vertex from the same client, so that the database cache is filled explicitly (before the second client attempts to retrieve it). Marc |
|
Boxuan Li
Thanks Marc for making it clear.
toggle quoted message
Show quoted text
@Wasantha, how did you implement your void invalidate(StaticBuffer key, List<CachableStaticBuffer> entries) method? Make sure you evict this key from your Redis cache. The default implementation in JanusGraph does not evict it immediately. Rather, it records this key in a local HashMap called expiredKeys and evicts the entry after a timeout. If you use this approach, and you don’t store expiredKeys on Redis, then your other instance could still read stale data. I personally think the usage of expiredKeys is not necessary in your case - you could simply evict the entry from Redis in the invalidate call. If you still have a problem, probably a better way is to share your code so that we could take a look at your implementation. Best, Boxuan
|
|
washerath@...
Hi Boxuan, private final Cache<KeySliceQuery,EntryList> cache; As per above initialization on cache object it persists only the specific query against result of that query. It does not caches all the traversal vertices. Is my understanding correct ? |
|
Boxuan Li
Hi Wasantha,
It's great to hear that you have solved the previous problem. Regarding contributing to the community, I would suggest you create a GitHub issue first, describing the problem and your approach there. This is not required but recommended. Then, you could create a pull request linking to that issue (note that you would also be asked to sign an individual CLA or corporate CLA once you create your first pull request in JanusGraph). I am not 100% sure if I understand your question, but I guess you are asking what exactly is stored in that cache. Basically, that cache stores the raw data fetched from the storage backend. It does not have to be vertex - it could be vertex properties or edges. It might also contain deserialized data (See getCache() method in Entry.java). Note that in your case, since your cache is not local, it might be a better idea to store only the raw data but not the deserialized data to reduce the network overhead. To achieve that, you could override Entry::setCache method and let it do nothing. If you are interested in learning more about the "raw data", I wrote a blog Data layout in JanusGraph that you might be interested in. Hope this helps. Boxuan |
|