JanusGraph database cache on distributed setup


washerath@...
 

In a multi node Janusgraph cluster, data modification done from one instance does not sync with others until it reaches the given expiry time (cache.db-cache-time)

As per the documentation[1] it does not recommends to enable database level cache in a distributed setup as cached data does not share amoung instances.

Any suggestions for a solution/workaround where i can see the data changes from other JG instances immediately and avoid stale data access ?


[1] https://docs.janusgraph.org/operations/cache/#cache-expiration-time


hadoopmarc@...
 

How fast is immediately?  A well dimensioned cassandra or scylladb cluster (with its own block cache!) should be able to serve requests at the ms level.

https://www.scylladb.com/2017/09/18/scylla-2-0-workload-conditioning/

You only run into trouble with queries that ask for tens or hundreds of vertices, but you can ask if it is reasonable to be realtime at the ms level for such large queries.

Best wishes,     Marc


washerath@...
 

Actually the concern is with db.cache feature.

Once we enable the db.cache, what ever the modification done for particular vertex only visible for that JG instance untill the cache expires. So if we have multiple JG instances, the modifications done from one instance does not reflect on other immediately. 

If we can have centralized cache which syncs up with all JG instances this can be avoided.

Thanks, Wasantha


Boxuan Li
 

Hi Wasantha,

A centralized cache is a good idea in many use cases. What you could do is to maintain a centralized cache by yourself. This, however, requires some changes to your application code (e.g. your app might need to do a look up in cache and then query JanusGraph). A more advanced approach is to rewrite ExpirationKCVSCache (https://javadoc.io/doc/org.janusgraph/janusgraph-core/latest/org/janusgraph/diskstorage/keycolumnvalue/cache/ExpirationKCVSCache.html) by yourself and let it store cache in a centralized cache rather than the local cache. Then, the db.cache feature should still work except that the cache is synced across JanusGraph instances.

Best,
Boxuan

On Feb 10, 2022, at 10:59 PM, washerath@... wrote:

Actually the concern is with db.cache feature.

Once we enable the db.cache, what ever the modification done for particular vertex only visible for that JG instance untill the cache expires. So if we have multiple JG instances, the modifications done from one instance does not reflect on other immediately. 

If we can have centralized cache which syncs up with all JG instances this can be avoided.

Thanks, Wasantha


washerath@...
 

Hi Boxuan,

I am evaluating the approach of rewriting ExpirationKCVSCache as suggested. There i could replace existing guava cache implementation to connect with remote Redis db.  So that Redis DB will be act as centralized cache which can connects with all other JG instances.

While going through the JG source it could find same guava cache implementation (cachebuilder = CacheBuilder.newBuilder()) uses on several other places. Eg . GuavaSubqueryCache, GuavaVertexCache, ...

Will it be sufficient to have modification only on ExpirationKCVSCache or do we need to look for modification on several other places as well ?

Thanks

Wasantha


Boxuan Li
 

Hi Wasantha,


It's great to see that you have made some progress. If possible, it would be awesome if you could contribute your implementation to the community!

Yes, modifying `ExpirationKCVSCache`  is enough. `GuavaSubqueryCache` and `GuavaVertexCache` and transaction-wise cache, so you don't want to make them global. `RelationQueryCache` and `SchemaCache` are graph-wise cache, you could make them global, but not necessary since they only store schema rather than real data - actually I would recommend not doing so because JanusGraph already has a mechanism of evicting stale schema cache.

Best,
Boxuan


washerath@...
 

Hi Boxuan,

I was able to change ExpirationKCVSCache class to persist the cache on Redis DB,

But i could still see some data anomaly between two JG instances. As example when i change a property of a vertex from one JG server [ g.V(40964200).property('data', 'some_other_value')

JG instance A
JG instance A

it does not reflect on other JG instance. 

JG instance B
JG instance B

When debuging the flow we could identify that when we triggering a vertex property modification, it gets persists on guava cache using  GuavaVertexCache add method and when retrieving it reads data using  get method on same class. This could be the reason for above observation.

Feels like we might need to do modifications on transaction-wise cache as well. Correct me if i am missing something here and happy to contribute the implementation to the community once this done.

Thanks

Wasantha


Boxuan Li
 

Hi Wasantha,

In your example, it looks like you didn't commit your transaction on JG instance A. Uncommitted changes are only visible to the local transaction on the local instance. Can you try committing it first on A and then query on B?

Best,
Boxuan


washerath@...
 

Hi Boxuan,

I was not using a session on gremlin console. So i guess it does not need to commit explicitly. Anyway i have tried commiting the transaction [ g.tx().commit() ] after opening a session, but same behaviour observered. 

Thanks

Wasantha


Boxuan Li
 

Hi Wasantha,

I am not familiar with the transaction scope when using a remote Gremlin server, so I could be wrong, but could you try rolling back the transaction explicitly on JG instance B? Just to make sure you are not accessing the stale data cached in a local transaction.

Best,
Boxuan

On Feb 19, 2022, at 11:51 AM, washerath@... wrote:

Hi Boxuan,

I was not using a session on gremlin console. So i guess it does not need to commit explicitly. Anyway i have tried commiting the transaction [ g.tx().commit() ] after opening a session, but same behaviour observered. 

Thanks

Wasantha



hadoopmarc@...
 

If you do not use sessions, remote requests to Gremlin Server are committed automatically, see: https://tinkerpop.apache.org/docs/current/reference/#considering-transactions .

Are you sure that committing a modification is sufficient to move over the change from the transaction cache to the database cache, botth in the current and in your new ReDis implementation? Maybe you can test by having a remote modification test followed by a retrieval request of the same vertex from the same client, so that the database cache is filled explicitly (before the second client attempts to retrieve it).

Marc


Boxuan Li
 

Thanks Marc for making it clear.

@Wasantha, how did you implement your void invalidate(StaticBuffer key, List<CachableStaticBuffer> entries) method? Make sure you evict this key from your Redis cache. The default implementation in JanusGraph does not evict it immediately. Rather, it records this key in a local HashMap called expiredKeys and evicts the entry after a timeout. If you use this approach, and you don’t store expiredKeys on Redis, then your other instance could still read stale data. I personally think the usage of expiredKeys is not necessary in your case - you could simply evict the entry from Redis in the invalidate call.

If you still have a problem, probably a better way is to share your code so that we could take a look at your implementation.

Best,
Boxuan

On Feb 20, 2022, at 6:23 AM, hadoopmarc@... wrote:

If you do not use sessions, remote requests to Gremlin Server are committed automatically, see: https://tinkerpop.apache.org/docs/current/reference/#considering-transactions .

Are you sure that committing a modification is sufficient to move over the change from the transaction cache to the database cache, botth in the current and in your new ReDis implementation? Maybe you can test by having a remote modification test followed by a retrieval request of the same vertex from the same client, so that the database cache is filled explicitly (before the second client attempts to retrieve it).

Marc


washerath@...
 

Hi Boxuan,

We were able to overcome sync issue on multiple JG instance after doing modifications on void invalidate(StaticBuffer key, List<CachableStaticBuffer> entries) as suggested. We had few performance issues to resolve which took considerable effort.

i am happy to contribute the implementation to community since it's almost done. Please guide on that. 

One another question on cache :

private final Cache<KeySliceQuery,EntryList> cache;

As per above initialization on cache object it persists only the specific query against result of that query. It does not caches all the traversal vertices. Is my understanding correct ?

Thanks
Wasantha


Boxuan Li
 

Hi Wasantha,

It's great to hear that you have solved the previous problem.

Regarding contributing to the community, I would suggest you create a GitHub issue first, describing the problem and your approach there. This is not required but recommended. Then, you could create a pull request linking to that issue (note that you would also be asked to sign an individual CLA or corporate CLA once you create your first pull request in JanusGraph).

I am not 100% sure if I understand your question, but I guess you are asking what exactly is stored in that cache. Basically, that cache stores the raw data fetched from the storage backend. It does not have to be vertex - it could be vertex properties or edges. It might also contain deserialized data (See getCache() method in Entry.java). Note that in your case, since your cache is not local, it might be a better idea to store only the raw data but not the deserialized data to reduce the network overhead. To achieve that, you could override Entry::setCache method and let it do nothing. If you are interested in learning more about the "raw data", I wrote a blog Data layout in JanusGraph that you might be interested in.

Hope this helps.
Boxuan