Shared cache store for JG


Misha Brukman <mbru...@...>
 

+1 to optional, process-external, pluggable (multi-implementation) caching system.

I like Rafael's idea of decoupling serialization from objects by allowing custom marshall/unmarshall API — for example, if your Java objects are protocol buffers, the custom API lets you use the protobuf methods rather than Java object serialization methods, which may be faster.

On Wed, Jun 7, 2017 at 3:22 PM, Rafael Fernandes <luizr...@...> wrote:
Hi Tg,
We we've decided to implement a cache layer (a cache-as-a-service microservice layer) outside KCVS due to the control over the cache on our data.
We've tried Redis as well as Hazelcast, the later has performed around 3x faster than Redis using their client/server terminology. their auto discovery is also fantastic.

I love the idea of sharing the cache across multiple processes (we had to implement outside) I'd say we make it as generic as possible so we can plug other vendors like hazelcast.

as for point 1), 
not a fan of serializing all the object chain and state, some times is just too much work and we don't need all to be serialized, I'd say you make the caller implement a "marshall/unmarshal" method so you call it before sending it through the wire and restoring the state. being using this strategy for a while and it is faster, a lot faster.

point 2) 
oh yeah, we're all implementing caching all over the place, making it OOB, optional with many vendors is like a dream.

I'd be more than happy to implement the hazelcast version if we get up votes on this.

cheers,
rafa


On Tuesday, June 6, 2017 at 7:23:09 PM UTC-4, Tunay Gür wrote:
Hi JG Developers, 

We (uber) currently using JG in production with a multi-machine deployment. Even though each JG process are able to cache KeySliceQuery and results individually (db-cache), the fact that it's not shared among all the instances makes it less efficient. We thought we could get a better performance by replacing per-process/in-memory cache with a shared cache (ie. Redis + Twemproxy). Since I'm at the early stages of implementation I'd really appreciate any suggestions in this direction. Also I have the following particular questions. 

1. I'll be providing RedisKCVSCache.java which will be similar to ExpirationKCVSCache.java and make them both implement a common cache interface. ExpirationKCVSCache has the luxury to save the java objects directly since it's sitting inside the java process itself. However to be able to cache KeySliceQuery result objects (i.e ) I'm going to make them serializable. Any other suggestions here ? 

2. My initial benchmarks showed ~15x latency improvements with %100 cache utilization compared to a cassandra-backed/cache-disabled deployment. Overall perf gains look lucrative to us to invest in the implementation, however I'm still not sure if this can be a useful feature for other JG users. Are you aware of any interest for this kind of a feature ? 

Thanks 
-Tg

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Rafael Fernandes <luizr...@...>
 

Hi Tg,
We we've decided to implement a cache layer (a cache-as-a-service microservice layer) outside KCVS due to the control over the cache on our data.
We've tried Redis as well as Hazelcast, the later has performed around 3x faster than Redis using their client/server terminology. their auto discovery is also fantastic.

I love the idea of sharing the cache across multiple processes (we had to implement outside) I'd say we make it as generic as possible so we can plug other vendors like hazelcast.

as for point 1), 
not a fan of serializing all the object chain and state, some times is just too much work and we don't need all to be serialized, I'd say you make the caller implement a "marshall/unmarshal" method so you call it before sending it through the wire and restoring the state. being using this strategy for a while and it is faster, a lot faster.

point 2) 
oh yeah, we're all implementing caching all over the place, making it OOB, optional with many vendors is like a dream.

I'd be more than happy to implement the hazelcast version if we get up votes on this.

cheers,
rafa


On Tuesday, June 6, 2017 at 7:23:09 PM UTC-4, Tunay Gür wrote:
Hi JG Developers, 

We (uber) currently using JG in production with a multi-machine deployment. Even though each JG process are able to cache KeySliceQuery and results individually (db-cache), the fact that it's not shared among all the instances makes it less efficient. We thought we could get a better performance by replacing per-process/in-memory cache with a shared cache (ie. Redis + Twemproxy). Since I'm at the early stages of implementation I'd really appreciate any suggestions in this direction. Also I have the following particular questions. 

1. I'll be providing RedisKCVSCache.java which will be similar to ExpirationKCVSCache.java and make them both implement a common cache interface. ExpirationKCVSCache has the luxury to save the java objects directly since it's sitting inside the java process itself. However to be able to cache KeySliceQuery result objects (i.e ) I'm going to make them serializable. Any other suggestions here ? 

2. My initial benchmarks showed ~15x latency improvements with %100 cache utilization compared to a cassandra-backed/cache-disabled deployment. Overall perf gains look lucrative to us to invest in the implementation, however I'm still not sure if this can be a useful feature for other JG users. Are you aware of any interest for this kind of a feature ? 

Thanks 
-Tg


Tunay Gür <tuna...@...>
 

Hi JG Developers, 

We (uber) currently using JG in production with a multi-machine deployment. Even though each JG process are able to cache KeySliceQuery and results individually (db-cache), the fact that it's not shared among all the instances makes it less efficient. We thought we could get a better performance by replacing per-process/in-memory cache with a shared cache (ie. Redis + Twemproxy). Since I'm at the early stages of implementation I'd really appreciate any suggestions in this direction. Also I have the following particular questions. 

1. I'll be providing RedisKCVSCache.java which will be similar to ExpirationKCVSCache.java and make them both implement a common cache interface. ExpirationKCVSCache has the luxury to save the java objects directly since it's sitting inside the java process itself. However to be able to cache KeySliceQuery result objects (i.e StaticArrayEntryList) I'm going to make them serializable. Any other suggestions here ? 

2. My initial benchmarks showed ~15x latency improvements with %100 cache utilization compared to a cassandra-backed/cache-disabled deployment. Overall perf gains look lucrative to us to invest in the implementation, however I'm still not sure if this can be a useful feature for other JG users. Are you aware of any interest for this kind of a feature ? 

Thanks 
-Tg