Count Query Optimisation

Vinayak Bali

Hi All, 

The Data Model of the graph is as follows:


Label: Node1, count: 130K
Label: Node2, count: 183K
Label: Node3, count: 437K
Label: Node4, count: 156


Node1 to Node2 Label: Edge1, count: 9K
Node2 to Node3 Label: Edge2, count: 200K
Node2 to Node4 Label: Edge3, count: 71K
Node4 to Node3 Label: Edge4, count: 15K
Node4 to Node1 Label: Edge5 , count: 1K

The Count query used to get vertex and edge count :

g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count())

This query takes around 3.5 mins to execute and the output returned is as follows:

The problem is traversing the edges takes more time.
g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes.
g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges

In some cases, subsequent calls are faster, due to cache usage. 
I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance?

Please help me to improve the performance, count query should not take much time.

Janusgraph : 0.5.2
Storage: Cassandra cql
The server specification is high and that is not the issue.

Thanks & Regards,

Join to automatically receive all group messages.