Re: Rapid deletion of vertices

Boxuan Li

Hi Scott,

One idea that first came into my mind is to first collect all vertex ids, and then delete them in batch & in parallel using multi-threading.

Best regards,

On May 5, 2022, at 5:57 PM, Scott Friedman <friedman@...> wrote:

Good afternoon,

We're running a docker-compose janusgraph:0.6.1 with cassandra:3 and elasticsearch:6.6.0.  We're primarily utilizing JanusGraph within Python 3.8 via gremlinpython.

We frequently reset our graph store to run an experiment or demonstration.  To date, we've either (1) dropped the graph and re-loaded our schema and re-defined our indices or (2) deleted all the vertices to maintain the schema and indices.  Often #2 is faster (and less error-prone), but it's slower for large graphs.  I hope somebody can lend some advice that will speed up our resettting-the-graph workflow with JanusGraph.

For deleting 6K nodes (and many incident edges), here's the timing data:

2022-05-05 16:40:44,261 - INFO - Deleting batch 1.

2022-05-05 16:41:09,961 - INFO - Deleting batch 2.

2022-05-05 16:41:27,689 - INFO - Deleting batch 3.

2022-05-05 16:41:43,678 - INFO - Deleting batch 4.

2022-05-05 16:41:45,561 - INFO - Deleted 6226 vertices over 4 batch(es). it takes roughly 1 minute to delete 6K vertices in batches of 2000.

Here's our Python code for deleting the nodes:

        batches = 0
        nodes = 0
        while True:
            batches += 1
            com.log(f'Deleting batch {batches}.')
            num_nodes = g.V().limit(batch_size).sideEffect(__.drop()).count().next()
            nodes += num_nodes
            if num_nodes < batch_size:
        log(f'Deleted {nodes} nodes over {batches} batch(es).')

This never fails, but it's obviously quite slow, especially for larger graphs.  Is there a way to speed this up?  We haven't tried running it async, since we're not sure how to do so safely.

Thanks in advance for any wisdom!


Join to automatically receive all group messages.