Is janusgraph more recommend to use?


gobi....@...
 

Hi guys,

I am currently running Cassandra cluster in production and recently I'm exploring janusgraph. I tried janusgraph server+ Cassandra+ES. I have some doubts like,
1. Is Janusgraph give (queries execution)results more efficiently than Cassandra using indexing?
2. Any advantages more than Cassandra?
3. Kindly advice me to setup janusgraph cluster with high availability and fault tolerance.
4. Is there any backup solution available for janusgraph?
5. Any migration way available for migrate the data?

Thanks in advance.


Aaron Ploetz <aaron...@...>
 

1. Yes, Cassandra secondary indexes are anti-patterns.  With JanusGraph, node and relationship data are stored together to increase traversal efficiency.  Do note, that JanusGraph will create its own Cassandra keyspace(s), and store data differently than a simple CRUD application would.  You can't just put JanusGraph on top of an existing Cassandra cluster and expect graph traversals to work.  Data has to be loaded/written through the JanusGraph process.

2. Let's be clear on something here.  Cassandra is a partitioned row store, and JanusGraph is a graph database.  They do not support similar use cases.  Therefore, the advantage is simple: if your application requires data relationship traversals, JanusGraph can do it and Cassandra cannot.  If you need to store data across partitions in "wide rows" while supporting a scalable query pattern, that's what Cassandra does well.

3. Check out the configuration documentation.  Also, the conf/ directory comes with several configuration files which you can use as a template to get started.  Pick the one that serves your architecture, add the IPs and authentication creds, and reference it from within your gremlin-server.yaml:

graphs: {
  graph: conf/janusgraph-cql-es-server.properties
}

As for HA, it is a good idea to run multiple JanusGraph server processes, which all use the same Cassandra and ElasticSearch clusters as a backend.  I usually build out Cassandra keyspaces with a replication factor (RF) of 3.  Because of this, it makes sense to build Cassandra clusters in multiples of 3 nodes, as well.  Likewise, HA will be improved if you can build instances on multiple availability zones (AZ).  I usually deploy my infra in configurations where the number of AZs == RF, and spread instances evenly across them.  This way, you can lose an entire AZ, and still be able to serve operations at QUORUM consistency.  I'll also build at least one JanusGraph instance in each AZ, for the same reason.

4. I am not aware of an all-encompassing backup solution for JanusGraph.  Typically, you would rely on the backup tools native to the storage layer (use Medusa or something specific to Cassandra).  Perhaps someone else can answer this better than I?

5. Not really.  As use cases can vary, most ETL solutions are custom built.  Spark can be used to load data into JanusGraph.

Regards,

Aaron


On Fri, Jun 5, 2020 at 12:50 PM gobi.ganesan via JanusGraph users <janusgra...@...> wrote:
Hi guys,

I am currently running Cassandra cluster in production and recently I'm exploring janusgraph. I tried janusgraph server+ Cassandra+ES. I have some doubts like,
1. Is Janusgraph give (queries execution)results more efficiently than Cassandra using indexing?
2. Any advantages more than Cassandra?
3. Kindly advice me to setup janusgraph cluster with high availability and fault tolerance.
4. Is there any backup solution available for janusgraph?
5. Any migration way available for migrate the data?

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/a1e3a80e-a079-4f8d-9535-76204ce38ae9o%40googlegroups.com.


Gobi Ganesan <gobi....@...>
 



Thank you for your valuable reply @Aaron Ploetz.