Count Query Optimisation
The Data Model of the graph is as follows:
Label: Node1, count: 130K
Label: Node2, count: 183K
Label: Node3, count: 437K
Label: Node4, count: 156
Node1 to Node2 Label: Edge1, count: 9K
Node2 to Node3 Label: Edge2, count: 200K
Node2 to Node4 Label: Edge3, count: 71K
Node4 to Node3 Label: Edge4, count: 15K
Node4 to Node1 Label: Edge5 , count: 1K
The Count query used to get vertex and edge count :
g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count())
This query takes around 3.5 mins to execute and the output returned is as follows:
The problem is traversing the edges takes more time.
g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes.
g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges
In some cases, subsequent calls are faster, due to cache usage.
I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance?
Please help me to improve the performance, count query should not take much time.
Janusgraph : 0.5.2
Storage: Cassandra cql
The server specification is high and that is not the issue.
Thanks & Regards,
For other readers, see also this other recent thread.
A couple of remarks:
Best wishes, Marc