Rapid deletion of vertices

Scott Friedman

Good afternoon,

We're running a docker-compose janusgraph:0.6.1 with cassandra:3 and elasticsearch:6.6.0.  We're primarily utilizing JanusGraph within Python 3.8 via gremlinpython.

We frequently reset our graph store to run an experiment or demonstration.  To date, we've either (1) dropped the graph and re-loaded our schema and re-defined our indices or (2) deleted all the vertices to maintain the schema and indices.  Often #2 is faster (and less error-prone), but it's slower for large graphs.  I hope somebody can lend some advice that will speed up our resettting-the-graph workflow with JanusGraph.

For deleting 6K nodes (and many incident edges), here's the timing data:

2022-05-05 16:40:44,261 - INFO - Deleting batch 1.

2022-05-05 16:41:09,961 - INFO - Deleting batch 2.

2022-05-05 16:41:27,689 - INFO - Deleting batch 3.

2022-05-05 16:41:43,678 - INFO - Deleting batch 4.

2022-05-05 16:41:45,561 - INFO - Deleted 6226 vertices over 4 batch(es).

...so it takes roughly 1 minute to delete 6K vertices in batches of 2000.

Here's our Python code for deleting the nodes:

        batches = 0
        nodes = 0
        while True:
            batches += 1
            com.log(f'Deleting batch {batches}.')
            num_nodes = g.V().limit(batch_size).sideEffect(__.drop()).count().next()
            nodes += num_nodes
            if num_nodes < batch_size:
        log(f'Deleted {nodes} nodes over {batches} batch(es).')

This never fails, but it's obviously quite slow, especially for larger graphs.  Is there a way to speed this up?  We haven't tried running it async, since we're not sure how to do so safely.

Thanks in advance for any wisdom!


Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.