Date
1 - 2 of 2
Count Query Optimisation
Vinayak Bali
Hi All, The Data Model of the graph is as follows: Nodes: Label: Node1, count: 130K Label: Node2, count: 183K Label: Node3, count: 437K Label: Node4, count: 156 Relations: Node1 to Node2 Label: Edge1, count: 9K Node2 to Node3 Label: Edge2, count: 200K Node2 to Node4 Label: Edge3, count: 71K Node4 to Node3 Label: Edge4, count: 15K Node4 to Node1 Label: Edge5 , count: 1K The Count query used to get vertex and edge count : g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count()) This query takes around 3.5 mins to execute and the output returned is as follows: [{"vertexCount":383633,"edgeCount":200166}] The problem is traversing the edges takes more time. g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes. g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges In some cases, subsequent calls are faster, due to cache usage. I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance? Please help me to improve the performance, count query should not take much time. Janusgraph : 0.5.2 Storage: Cassandra cql The server specification is high and that is not the issue. Thanks & Regards, Vinayak |
|
hadoopmarc@...
Hi Vinayak,
For other readers, see also this other recent thread. A couple of remarks:
Best wishes, Marc |
|