Query failure due to cassandra backend tombstone exception #1675


Alex Lawrence <alexn...@...>
 

Hi all,

I have raised a GitHub issue on the same (https://github.com/JanusGraph/janusgraph/issues/1675). Pasting the same below.

Janus version- janusgraph-0.3.1
Cassandra - cassandra:3.11.4

When we run janus with the cassandra backend, after a period of time the janusdb starts throwing the below errors and goes in to an unusable state.

Janus Logs:
466489 [gremlin-server-exec-6] INFO org.janusgraph.diskstorage.util.BackendOperation - Temporary exception during backend operation [EdgeStoreKeys]. Attempting backoff retry. org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at io.vavr.API$Match$Case0.apply(API.java:3174) at io.vavr.API$Match.of(API.java:3137) at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:123) at io.vavr.control.Try.getOrElseThrow(Try.java:671) at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getKeys(CQLKeyColumnValueStore.java:405)

Caused by: com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency QUORUM (1 responses were required but only 0 replica responded, 1 failed) at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:130) at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:30)

Cassandra Logs:
WARN [ReadStage-2] 2019-07-19 11:40:02,980 ReadCommand.java:569 - Read 74 live rows and 100001 tombstone cells for query SELECT * FROM janusgraph.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100 (see tombstone_warn_threshold)
ERROR [ReadStage-2] 2019-07-19 11:40:02,980 StorageProxy.java:1896 - Scanned over 100001 tombstones during query 'SELECT * FROM janusgraph.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((00000000002b9d88), 02)); query aborted

Related Issue:
#934

Related Question in stack overflow:
https://stackoverflow.com/questions/47069563/cassandra-failure-during-read-query-at-consistency-local-one-1-responses-were-r

Solutions Suggested:
https://groups.google.com/forum/#!searchin/janusgraph-users/cassandra$20tombstones%7Csort:date/janusgraph-users/nWc6EXmhn50/BOsALnn6DAAJ

Questions:

  1. What is the right approach towards this. ?
  2. Is Edge updates are stored as a new item causing tombstones ?. (since janus is a fork of titan). https://stackoverflow.com/questions/36542748/how-to-increment-number-of-visit-count-in-titan-graph-database-edge-label/36544160#36544160

Any solution would be really helpful. Thanks.


Alex Lawrence <alexn...@...>
 

  1. Update to the edges didn't cause tombstones in the JanusGraph.

Solutions:

  • Reduce the gc_grace_seconds to a lower value based on the deletions of edge/vertex.
  • Also can consider tuning the "tombstone_failure_threshold" in cassandra.yaml based on the needs.

On Friday, July 19, 2019 at 5:51:04 PM UTC+5:30, Alex Lawrence wrote:
Hi all,

I have raised a GitHub issue on the same (https://github.com/JanusGraph/janusgraph/issues/1675). Pasting the same below.

Janus version- janusgraph-0.3.1
Cassandra - cassandra:3.11.4

When we run janus with the cassandra backend, after a period of time the janusdb starts throwing the below errors and goes in to an unusable state.

Janus Logs:
466489 [gremlin-server-exec-6] INFO org.janusgraph.diskstorage.util.BackendOperation - Temporary exception during backend operation [EdgeStoreKeys]. Attempting backoff retry. org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at io.vavr.API$Match$Case0.apply(API.java:3174) at io.vavr.API$Match.of(API.java:3137) at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:123) at io.vavr.control.Try.getOrElseThrow(Try.java:671) at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getKeys(CQLKeyColumnValueStore.java:405)

Caused by: com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency QUORUM (1 responses were required but only 0 replica responded, 1 failed) at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:130) at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:30)

Cassandra Logs:
WARN [ReadStage-2] 2019-07-19 11:40:02,980 ReadCommand.java:569 - Read 74 live rows and 100001 tombstone cells for query SELECT * FROM janusgraph.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100 (see tombstone_warn_threshold)
ERROR [ReadStage-2] 2019-07-19 11:40:02,980 StorageProxy.java:1896 - Scanned over 100001 tombstones during query 'SELECT * FROM janusgraph.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((00000000002b9d88), 02)); query aborted

Related Issue:
#934

Related Question in stack overflow:
https://stackoverflow.com/questions/47069563/cassandra-failure-during-read-query-at-consistency-local-one-1-responses-were-r

Solutions Suggested:
https://groups.google.com/forum/#!searchin/janusgraph-users/cassandra$20tombstones%7Csort:date/janusgraph-users/nWc6EXmhn50/BOsALnn6DAAJ

Questions:

  1. What is the right approach towards this. ?
  2. Is Edge updates are stored as a new item causing tombstones ?. (since janus is a fork of titan). https://stackoverflow.com/questions/36542748/how-to-increment-number-of-visit-count-in-titan-graph-database-edge-label/36544160#36544160

Any solution would be really helpful. Thanks.


Clement de Groc
 

Next JanusGraph release will allow tuning Cassandra's gc_grace_seconds: https://github.com/JanusGraph/janusgraph/pull/2693