Re: Cassandra crashing after dropping large graph. Error: Scanned over 100001 tombstones...


Michael Kaiser-Cross <mkaise...@...>
 

Ok that's one way to drop all data in table but I can't do that in production because many users will share the same table. If I want to delete all nodes belonging to one user for instance I would need to do something like g.V().has("user", "john").drop() and I will end up getting this crash if too many users delete their data during that 10 day period. I would still like to know how to set this option if possible.

On Wednesday, December 26, 2018 at 2:21:37 AM UTC-5, willam boss wrote:
My cluster base on hbase, If I want to drop all data ,I will use hbase shell and execute : disable tablename, drop tablename ,this is a fastest way for data deleteing

在 2018年12月26日星期三 UTC+8下午3:06:57,Michael Kaiser-Cross写道:
After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message.

data_storage_1     | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query aborted

I found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently.

I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh?


Mike

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.