Re: Cleaning up old data in large graphs


hadoopmarc@...
 

Hi Mladen,

Just two things that come up while reading your story:
  • the cassandra TTL feature seems promising for your use case, see e.g. https://www.geeksforgeeks.org/time-to-live-ttl-for-a-column-in-cassandra/ I guess this would require code changes in janusgraph-cassandra.
  • how is transaction control in the spark jobs? You want transactions of reasonable size (say 10.000 vertices or edges) and you want spark tasks to fail if the transaction commit fails. In that way spark will repeat the task and will hopefully succeed.

Best wishes,    Marc

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.