[Titan, Cassandra] Query performance degrades with network latency between Cassandra nodes


bdi...@...
 

We are using Titan and Cassandra and will be moving to JanusGraph in the future.

I am are observing that graph query performance degrades with increased latency between Cassandra nodes. We are using READ consistency level = ONE. I was assuming with CL as ONE, there is no traffic between Cassandra nodes for graph queries. I this assumption correct?
If not, am I missing something here?

Thanks
Bharat


Jason Plurad <plu...@...>
 

It depends on your replication factor. If you didn't set it explicitly, the default RF=1. As you add nodes and the data gets distributed across the Cassandra nodes, the coordinator may need to connect to another Cassandra node to find the data.

Datastax has the best docs on Cassandra. Read them while you can!
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsRead_c.html


On Tuesday, March 7, 2017 at 7:56:12 PM UTC-5, Bharat Dighe wrote:
We are using Titan and Cassandra and will be moving to JanusGraph in the future.

I am are observing that graph query performance degrades with increased latency between Cassandra nodes. We are using READ consistency level = ONE. I was assuming with CL as ONE, there is no traffic between Cassandra nodes for graph queries. I this assumption correct?
If not, am I missing something here?

Thanks
Bharat


bdi...@...
 

This is a two node cassandra cluster with Netwoktopology strategy. The data ownership is 100% on both the nodes.

KEYSPACE my_ks;


CREATE KEYSPACE my_ks WITH replication
= {'class': 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'}  AND durable_writes = true;




# /opt/cassandra/bin/nodetool status my_ks
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                               Rack
UN  
<IP1>  4.73 MB    256     100.0%            922409ad-cc19-446c-93e3-f02551d130dd  rac1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                               Rack
UN  
<IP2>  4.78 MB    256     100.0%            0afefc82-9783-4dcc-8aa8-6aafc42a979f  rac1





When I run the queries on my_ks using cqlsh, I get constant time for increased latency between dc1 and dc2. 

But with titan query performance degrades as latency goes up between dc1 and dc2. I am wondering that titan is doing extra that it has to reach out to other node even if The Read CL is "ONE" and the data ownership of the local node is 100%?

Here are the graph config parameters.

storage.backend=cassandra
storage.cassandra.read-consistency-level=ONE
storage.cassandra.write-consistency-level=ALL
storage.hostname=localhost
storage.port=9160
storage.cassandra.keyspace=my_ks
storage.cassandra.astyanax.local-datacenter=dc1
storage.cassandra.astyanax.connection-pool-type=ROUND_ROBIN
storage.cassandra.astyanax.node-discovery-type=NONE
storage.read-only=true
query.fast-property=true
query.force-index=true

Thanks
Bharat


On Tuesday, March 7, 2017 at 5:28:47 PM UTC-8, Jason Plurad wrote:
It depends on your replication factor. If you didn't set it explicitly, the default RF=1. As you add nodes and the data gets distributed across the Cassandra nodes, the coordinator may need to connect to another Cassandra node to find the data.

Datastax has the best docs on Cassandra. Read them while you can!
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsRead_c.html

On Tuesday, March 7, 2017 at 7:56:12 PM UTC-5, Bharat Dighe wrote:
We are using Titan and Cassandra and will be moving to JanusGraph in the future.

I am are observing that graph query performance degrades with increased latency between Cassandra nodes. We are using READ consistency level = ONE. I was assuming with CL as ONE, there is no traffic between Cassandra nodes for graph queries. I this assumption correct?
If not, am I missing something here?

Thanks
Bharat