Recommendation for Storage Backend
Bassem Naguib <bass...@...>
I am looking into using JanusGraph for a new multi-tenant SaaS application. And I wanted to ask the community for help on choosing a suitable storage backend for my use case.
We need to have as much data isolation as possible between tenants. In our previous projects with similar requirements, we used a relational DBMS with a database-per-tenant isolation strategy. So I assume, in the JanusGraph world, it will be graph-per-tenant?
Also we know that a single tenant's graph will never grow too big to fit on one server. But we may need to divide the tenant graphs between multiple graph DB servers.
We do not care very much about dividing the DB server resources equally between tenants. So the "Noisy Neighbor" problem is not a concern.
Finally, we are looking for minimum read and write latency. And as close to ACID transactions as we can get.
I would love to hear you guys' thoughts about suitable storage backend(s) for this use case.
Thanks in advance!
Peter Corless <pe...@...>
On Thu, Oct 29, 2020, 9:06 AM Bassem Naguib <bass...@...> wrote:
Everyone's going to have a bias, and I want to be transparent. I work for ScyllaDB, so of course I think they are the best. I will do my best, however, to give you reasons which I hope prove sufficiently helpful.
That sounds appropriate. You might segregate user data in Scylla, which underlies JanusGraphc, via separate tables or even completely separate keyspaces.
Yes. Scylla would automatically shard data across nodes. Whatever persistent database you choose to reside under JanusGraph, make sure it automatically shards data across nodes.
No, but there are problems by having neighbors at all. Heartbleed-like attacks such as Zombieload. We wrote this up as a piece for people who wish to think about ways to protect against attacks from neighbors, noisy or otherwise.
...this is a fundamental tug of war. Because ACID is specifically not as fast as possible to provide consistency.
ScyllaDB is written not with a full ACID guarantee. Like Cassandra it leans to the AP-mode of the CAP theorem rather than CP. But we do have LWT. And our LWT implementation is inherently more efficient than the design decisions Cassandra made — less round trips, for instance. But adding *any* sort of linearizabity, or any strict consistency levels like CL=ALL), is going to increase your latencies and/or lower your throughput, unless (or even if) you scale in terms of, say, capacity and concurrency. Those have their own prices, limits and tradeoffs, too. That's just the nature of it. It is vital for you to really think which is most vital: strict ACID consistency or performance / availability.
You can read more about our LWT implementation here.
Other vendors can definitely claim more strict adherence to ACID. The question would be to clarify for your use case which are the higher priorities, and what your SLAs for each might be.
• Consistency (even at the sacrifice of the above three)
• "Correctness" in terms of ACID compliance
When it comes down to it, our users feel the top three win out over the bottom two. But this is definitely a use-case specific judgment call for you.
A couple of real-world use cases to consider:
• Zeotap: https://www.scylladb.com/2020/05/14/zeotap-a-graph-of-twenty-billion-ids-built-on-scylla-and-janusgraph/
• FireEye: https://www.scylladb.com/2020/02/04/fireeye-providing-real-time-threat-analysis-using-a-graph-database/
Bassem Naguib <bass...@...>
Thank you so much Peter for the detailed answer! I think we need to switch between ScyllaDB and Cassandra to understand the differences better. It is good that JanusGraph enables you to switch between storage backends hopefully without a lot of trouble.toggle quoted messageShow quoted text
On Friday, October 30, 2020 at 4:55:55 PM UTC-6 p...@... wrote: