Re: Recommendation for Storage Backend


Peter Corless <pe...@...>
 

Comments inline.

On Thu, Oct 29, 2020, 9:06 AM Bassem Naguib <bass...@...> wrote:

Hello,

I am looking into using JanusGraph for a new multi-tenant SaaS application. And I wanted to ask the community for help on choosing a suitable storage backend for my use case.

Everyone's going to have a bias, and I want to be transparent. I work for ScyllaDB, so of course I think they are the best. I will do my best, however, to give you reasons which I hope prove sufficiently helpful.

We need to have as much data isolation as possible between tenants. In our previous projects with similar requirements, we used a relational DBMS with a database-per-tenant isolation strategy. So I assume, in the JanusGraph world, it will be graph-per-tenant?

That sounds appropriate. You might segregate user data in Scylla, which underlies JanusGraphc, via separate tables or even completely separate keyspaces.

Also we know that a single tenant's graph will never grow too big to fit on one server. But we may need to divide the tenant graphs between multiple graph DB servers.

Yes. Scylla would automatically shard data across nodes. Whatever persistent database you choose to reside under JanusGraph, make sure it automatically shards data across nodes.

We do not care very much about dividing the DB server resources equally between tenants. So the "Noisy Neighbor" problem is not a concern.

No, but there are problems by having neighbors at all. Heartbleed-like attacks such as Zombieload. We wrote this up as a piece for people who wish to think about ways to protect against attacks from neighbors, noisy or otherwise.


Finally, we are looking for minimum read and write latency. And as close to ACID transactions as we can get.

...this is a fundamental tug of war. Because ACID is specifically not as fast as possible to provide consistency. 

ScyllaDB is written not with a full ACID guarantee. Like Cassandra it leans to the AP-mode of the CAP theorem rather than CP. But we do have LWT. And our LWT implementation is inherently more efficient than the design decisions Cassandra made — less round trips, for instance. But adding *any* sort of linearizabity, or any strict consistency levels like CL=ALL), is going to increase your latencies and/or lower your throughput, unless (or even if) you scale in terms of, say, capacity and concurrency. Those have their own prices, limits and tradeoffs, too. That's just the nature of it. It is vital for you to really think which is most vital: strict ACID consistency or performance / availability.

You can read more about our LWT implementation here.


Other vendors can definitely claim more strict adherence to ACID. The question would be to clarify for your use case which are the higher priorities, and what your SLAs for each might be.

• Latency
• Throughput
• Availability
• Consistency (even at the sacrifice of the above three)
• "Correctness" in terms of ACID compliance

When it comes down to it, our users feel the top three win out over the bottom two. But this is definitely a use-case specific judgment call for you.

A couple of real-world use cases to consider:



Sincerely,

-Peter Corless.

I would love to hear you guys' thoughts about suitable storage backend(s) for this use case.

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5ca4c350-9558-4d7b-9cc8-fe2b6a817975n%40googlegroups.com.

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.