I'm currently working on a high throughput ingestion to Janusgraph ( backed by cassandra
the ingestion is based on Vertices with 3-4 properties and edges connect them
In case of event i need to : upsert vertices and create edges
so i traverse on composite index twice to get vertices ( in case i dont find them - i create them)
Then i add the edge
Checking the existence for every and each vertex twice for every event ( and i have thousands / sec)
Is very problematic and decreases the throughput from 3000 / core for insert only to 700/800 /core for every vertex existence test
In order to improve this i have several ideas
1. search vertex by id only with manual id assignment but this can impact vertx distribution accross cassandra cluster
2. making upsert for edge creation - in case id doesnt exist - create it without any properties and add edge, later on if vertx creation will occure - add ptoperties to existing vertex id ( its actually empty vertex instance)
What do you think about ideas or maybe you have another options?
Main question is : How to create edges without existing vertices in order not to check their existence and get write perf degradation?