Date
1 - 6 of 6
Cassandra crashing after dropping large graph. Error: Scanned over 100001 tombstones...
Michael Kaiser-Cross <mkaise...@...>
After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message. data_storage_1 | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query aborted I found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently. I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh? Mike |
|
willam boss <jcbm...@...>
My cluster base on hbase, If I want to drop all data ,I will use hbase shell and execute : disable tablename, drop tablename ,this is a fastest way for data deleteing 在 2018年12月26日星期三 UTC+8下午3:06:57,Michael Kaiser-Cross写道:
|
|
Michael Kaiser-Cross <mkaise...@...>
Ok that's one way to drop all data in table but I can't do that in production because many users will share the same table. If I want to delete all nodes belonging to one user for instance I would need to do something like g.V().has("user", "john").drop() and I will end up getting this crash if too many users delete their data during that 10 day period. I would still like to know how to set this option if possible. On Wednesday, December 26, 2018 at 2:21:37 AM UTC-5, willam boss wrote:
|
|
Reinhard <med...@...>
it is easy to prevent this by increasing a threshold in cassandra.yaml. tombstone_failure_threshold It is set to 100000 Am Mittwoch, 26. Dezember 2018 08:06:57 UTC+1 schrieb Michael Kaiser-Cross:
|
|
Michael Kaiser-Cross <mkaise...@...>
Ok thanks that is helpful. That solves my problem I guess but I still think it would be a good idea to implement the ability to change the gc_grace_seconds of a graph. People might run into this issue as they scale apps in production; especially apps that do a lot of deleting. On Wednesday, December 26, 2018 at 3:59:20 AM UTC-5, Reinhard wrote:
|
|
Clement de Groc
Next JanusGraph release will allow tuning Cassandra's gc_grace_seconds: https://github.com/JanusGraph/janusgraph/pull/2693
|
|