Re: Cleaning up old data in large graphs


Mladen Marović
 
Edited

Hi Mark,

thanks for the response.

  1. As described in https://docs.janusgraph.org/schema/advschema/, TTL is already supported. However, there are two issues in my case:

    a) Changing the TTL is supported, but the new TTL will only be applied on inserts and updates. In other words, if I have a TTL of 12 months, I change it to 18 months, it will effectively take 12 months before that change comes into effect because all the old data will still have TTL set to 12 months. A possible workaround would be to run over all objects in the database and update them in some way to force setting the new TTL, although that seems a bit costly.

    b) I'm not sure how the TTL setting applies exactly in Janusgraph. Is it set only on the data or on the composite indexes as well? Because if it's set only on the data, then after a while the indexes should be filled with non-existing entries. I can confirm this to be the case for mixed indexes - during testing, data was deleted in cassandra, but mixed index entries in elasticsearch were not, which means that I would have to delete them manually as well. This would be OK if janusgraph supported using multiple indexes in elasticsearch for a single index (which would be a really cool feature btw!), but I don't think that's the case - I tried to trick janusgraph into using an alias, but things did not work as expected.

  2. I don't think the problem in the Spark jobs is with transactions. By default, in case of an exception, Spark should repeat that task, and eventually the job ends, so all tasks finished successfully. Also, in my case, there actually are no exceptions. I even managed to manually find the vertices that caused the issues via the gremlin console, but their valueMap() is {} where I would expect it to contain the 10-15 properties they usually have, if they weren't deleted. Basically, Janusgraph acts as if it found a vertex (or some part of it), but during deletion, nothing happens.

    If I remember correctly, I tried to analyze what is happening a while ago and I seem to have found some place in the janusgraph source where a dummy (empty) vertex is created if Janusgraph does not find the proper data. I guess that's what's happening to me when I get the {} result. Maybe the index entry wasn't cleaned up, Janusgraph thinks there should be something, finds nothing, so it returns the empty vertex. When I try to delete it, again there is nothing to be deleted so the index entry isn't cleared. I don't know if that's actually possible, but that might explain my case.

Best regards,

Mladen Marović

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.