Date
1 - 6 of 6
Cassandra crashing after dropping large graph. Error: Scanned over 100001 tombstones...
Michael Kaiser-Cross <mkaise...@...>
Ok thanks that is helpful. That solves my problem I guess but I still think it would be a good idea to implement the ability to change the gc_grace_seconds of a graph. People might run into this issue as they scale apps in production; especially apps that do a lot of deleting.
On Wednesday, December 26, 2018 at 3:59:20 AM UTC-5, Reinhard wrote:
it is easy to prevent this by increasing a threshold in cassandra.yaml.tombstone_failure_thresholdIt is set to 100000
Am Mittwoch, 26. Dezember 2018 08:06:57 UTC+1 schrieb Michael Kaiser-Cross:After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message.data_storage_1 | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query abortedI found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently.I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh?Mike
Reinhard <med...@...>
it is easy to prevent this by increasing a threshold in cassandra.yaml.
tombstone_failure_threshold
Am Mittwoch, 26. Dezember 2018 08:06:57 UTC+1 schrieb Michael Kaiser-Cross:
It is set to 100000
Am Mittwoch, 26. Dezember 2018 08:06:57 UTC+1 schrieb Michael Kaiser-Cross:
After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message.data_storage_1 | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query abortedI found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently.I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh?Mike
Michael Kaiser-Cross <mkaise...@...>
Ok that's one way to drop all data in table but I can't do that in production because many users will share the same table. If I want to delete all nodes belonging to one user for instance I would need to do something like g.V().has("user", "john").drop() and I will end up getting this crash if too many users delete their data during that 10 day period. I would still like to know how to set this option if possible.
On Wednesday, December 26, 2018 at 2:21:37 AM UTC-5, willam boss wrote:
My cluster base on hbase, If I want to drop all data ,I will use hbase shell and execute : disable tablename, drop tablename ,this is a fastest way for data deleteing
在 2018年12月26日星期三 UTC+8下午3:06:57,Michael Kaiser-Cross写道:After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message.data_storage_1 | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query abortedI found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently.I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh?Mike
willam boss <jcbm...@...>
My cluster base on hbase, If I want to drop all data ,I will use hbase shell and execute : disable tablename, drop tablename ,this is a fastest way for data deleteing
在 2018年12月26日星期三 UTC+8下午3:06:57,Michael Kaiser-Cross写道:
在 2018年12月26日星期三 UTC+8下午3:06:57,Michael Kaiser-Cross写道:
After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message.data_storage_1 | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query abortedI found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently.I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh?Mike
Michael Kaiser-Cross <mkaise...@...>
After loading a large graph 20k nodes into JanusGraph a couple of times and then deleting via g.V().drop() my cassandra server crashes with the below message.
data_storage_1 | ERROR [ReadStage-2] 2018-12-26 06:36:20,083 StorageProxy.java:1909 - Scanned over 100001 tombstones during query 'SELECT * FROM ns.edgestore WHERE column1 >= 02 AND column1 <= 03 LIMIT 100' (last scanned row partion key was ((d00000000001e300), 02)); query aborted
I found an article on deleting and tombstones in cassandra here. If I understand correctly I am creating too many tombstones too quickly and the tombstones won't be deleted until after the default period which is 10 days. I also found this stack overflow question which suggests lowering the cassandra gc_grace_seconds table option which will result in tombstones being removed more frequently.
I wanted to try this to see if it fixes the problem for me but since JanusGraph creates the tables how would I customize this value? Is there some way to set cassandra table options when starting gremlin-server.sh?
Mike