Date
1 - 18 of 18
Count Query Optimization
Vinayak Bali
Hi All, The schema consists of A, B as nodes, and E as an edge with some other nodes and edges. A: 183468 B: 437317 E: 186513 Query: g.V().has('property1', 'A').as('v1').outE().has('property1','E').as('e').inV().has('property1', 'B').as('v2').select('v1','e','v2').dedup().count() Output: 200166 Time Taken: 1min Query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count()) Output: ==>[vetexCount:383633,edgeCount:200166] Time: 3.5 mins Property1 is the index. How can I optimize the queries because minutes of time for count query is not optimal. Please suggest different approaches. Thanks & Regards, Vinayak |
|
AMIYA KUMAR SAHOO
Hi Vinayak, For query 1. What is the degree centrality of vertex having property A. How much percentage satisfy out edge having property E. If it is small, VCI will help to increase speed for this traversal. You can give it a try to below query, not sure if it will speed up. g.V().has('property1', 'A'). outE().has('property1','E'). inV().has('property1', 'B'). dedup().by(path()). count() On Fri, 12 Mar 2021, 13:30 Vinayak Bali, <vinayakbali16@...> wrote:
|
|
hadoopmarc@...
Hi all,
I also thought about the vertex centrex index first, but I am afraid that the VCI can only help to filter the edges to follow, but it does not help in counting the edges. A better way to investigate is to leave out the final inV() step. So, e.g. you can count the number of distinct v2 id's with: g.V().has('property1', 'A').outE().has('property1','E').id().map{it.get().getOutVertexId()}.dedup().count() Note that E().id() returns RelationIdentifier() objects that contain both the edge id, the inVertexId and the OutVertexId. This should diminish the number of storage backend calls. Best wishes, Marc |
|
AMIYA KUMAR SAHOO
Hi Marc,
Vinayak query has a filter on inV property (property1 = B), hence I did not stop at edge itself. If this kind of query is frequent, decision can be made if the same value makes sense to keep duplicate at both vertex and edge. That will help eliminate the traversal to the out vertex. Regards, Amiya |
|
Boxuan Li
Apart from rewriting the query, there are some config options (https://docs.janusgraph.org/basics/configuration-reference/#query) worth trying:
1) Turn on query.batch 2) Turn off query.fast-property |
|
Vinayak Bali
Hi All, The solution from BO XUAN LI to change config files worked for the following query: g.V().has('property1', 'A').as('v1').outE().has('property1','E').as('e').inV().has('property1', 'B').as('v2').select('v1','e','v2').dedup().count() But not for the following query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count()) I need an optimized query to get both nodes, as well as edges, count. Request you to provide your valuable feedback and help me to achieve it. Thanks & Regards, Vinayak On Sat, Mar 13, 2021 at 8:16 AM BO XUAN LI <liboxuan@...> wrote: Apart from rewriting the query, there are some config options (https://docs.janusgraph.org/basics/configuration-reference/#query) worth trying: |
|
hadoopmarc@...
Hi Vinayak,
Referring to you last post, what happens if you use aggregate(local, 'v') and aggregate(local, 'e'). The local modifier makes the aggregate() step lazy, which hopefully gives janusgraph more opportunity to batch the storage backend requests. https://tinkerpop.apache.org/docs/current/reference/#store-step Best wishes, Marc |
|
Vinayak Bali
Hi Marc, Using local returns the output after each count. For example: ==>[vetexCount:184439,edgeCount:972] ==>[vetexCount:184440,edgeCount:973] ==>[vetexCount:184441,edgeCount:974] ==>[vetexCount:184442,edgeCount:975] ==>[vetexCount:184443,edgeCount:976] ==>[vetexCount:184444,edgeCount:977] ==>[vetexCount:184445,edgeCount:978] ==>[vetexCount:184446,edgeCount:979] ==>[vetexCount:184447,edgeCount:980] ==>[vetexCount:184448,edgeCount:981] ==>[vetexCount:184449,edgeCount:982] ==>[vetexCount:184450,edgeCount:983] ==>[vetexCount:184451,edgeCount:984] ==>[vetexCount:184452,edgeCount:985] ==>[vetexCount:184453,edgeCount:986] ==>[vetexCount:184454,edgeCount:987] ==>[vetexCount:184455,edgeCount:988] ==>[vetexCount:184456,edgeCount:989] ==>[vetexCount:184457,edgeCount:990] ==>[vetexCount:184458,edgeCount:991] ==>[vetexCount:184459,edgeCount:992] ==>[vetexCount:184460,edgeCount:993] ==>[vetexCount:184461,edgeCount:994] ==>[vetexCount:184462,edgeCount:995] ==>[vetexCount:184463,edgeCount:996] ==>[vetexCount:184464,edgeCount:997] ==>[vetexCount:184465,edgeCount:998] You can suggest some other approach too. I really need it working. Thanks & Regards, Vinayak On Wed, Mar 17, 2021 at 5:54 PM <hadoopmarc@...> wrote: Hi Vinayak, |
|
Nicolas Trangosi <nicolas.trangosi@...>
Hi, You may try to use denormalization by setting
property1 from inV also on edge. Then once edges are updated, following query should work: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').has('inVproperty1', 'B').aggregate('e').inV().aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count()) Le mer. 17 mars 2021 à 14:05, Vinayak Bali <vinayakbali16@...> a écrit :
--
![]() Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you. |
|
hadoopmarc@...
Hi Vinayak,
Another attempt, this one is very similar to the one that works. gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties') ==>standardjanusgraph[inmemory:[127.0.0.1]] gremlin> g = graph.traversal() ==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard] gremlin> GraphOfTheGodsFactory.loadWithoutMixedIndex(graph,true) ==>null gremlin> g.V().as('v1').outE().as('e').inV().as('v2').union(select('v1'), select('v2')).dedup().count() 16:12:39 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes ==>12 gremlin> g.V().as('v1').outE().as('e').inV().as('v2').select('e').dedup().count() 16:15:30 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes ==>17 gremlin> g.V().as('v1').outE().as('e').inV().as('v2').union( ......1> union(select('v1'), select('v2')).dedup().count(), ......2> select('e').dedup().count().as('ecount') ......3> ) 16:27:42 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes ==>12 ==>17 Best wishes, Marc |
|
AMIYA KUMAR SAHOO
Hi Vinayak,
May be try below. g.V().has('property1', 'A'). outE().has('property1','E'). where(inV().has('property1', 'B')). fold(). project('edgeCount', 'vertexCount'). by(count(local)).
by(unfold().bothV().dedup().count()) // I do not think dedup is required for your use case, can try both with and without dedup Regards, Amiya |
|
Vinayak Bali
Hi Amiya, With dedup: g.V().has('property1', 'A'). outE().has('property1','E'). where(inV().has('property1', 'B')). fold(). project('edgeCount', 'vertexCount'). by(count(local)). by(unfold().bothV().dedup().count()) Output: ==>[edgeCount:200166,vertexCount:34693] without dedup: g.V().has('property1', 'A'). outE().has('property1','E'). where(inV().has('property1', 'B')). fold(). project('edgeCount', 'vertexCount'). by(count(local)). by(unfold().bothV().count()) Output: ==>[edgeCount:200166,vertexCount:400332] Both queries are taking approx 3 sec to run. Query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count()) Output: ==>[vetexCount:383633,edgeCount:200166] Time: 3.5 mins Edge Count is the same for all the queries but getting different vertexCount. Which one is the right vertex count?? Thanks & Regards, Vinayak On Thu, Mar 18, 2021 at 11:18 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote: Hi Vinayak, |
|
AMIYA KUMAR SAHOO
Hi Vinayak,
Correct vertex count is ( 400332 non-unique, 34693 unique). g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A might be getting included in count in your second query because of eager evaluation (does not matter they have outE with property1 = E or not) Regards, Amiya |
|
Vinayak Bali
Amiya - I need to check the data, there is some mismatch with the counts. Consider we have more than one relation to get the count. How can we modify the query? For example: A->E->B query is as follows: g.V().has('property1', 'A'). outE().has('property1','E'). where(inV().has('property1', 'B')). fold(). project('edgeCount', 'vertexCount'). by(count(local)). by(unfold().bothV().dedup().count()) A->E->B->E1->C->E2->D What changes can be made in the query ?? Thanks On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote: Hi Vinayak, |
|
AMIYA KUMAR SAHOO
Hi Vinayak, Try below. If it works for you, you can add E2 and D similarly. g.V().has('property1', 'A'). outE().has('property1', 'E').as('e'). inV().has('property1', 'B'). outE().has('property1', 'E1').as('e'). where (inV().has('property1', 'C')). select (all, 'e').fold(). project('edgeCount', 'vertexCount'). by(count(local)). by(unfold().bothV().dedup().count()) Regards, Amiya On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
|
|
Vinayak Bali
Hi All, Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 1) Turn on query.batch 2) Turn off query.fast-property Count query is faster but edge traversal becomes more expensive. Is there any other way to improve count performance without affecting other queries. Thanks & Regards, Vinayak On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
|
|
Boxuan Li
Have you tried keeping query.batch = true AND query.fast-property = true?
toggle quoted message
Show quoted text
Regards, Boxuan
|
|
Vinayak Bali
Hi All, query.batch = true AND query.fast-property = true this doesn't work. facing the same problem. Is there any other way?? Thanks & Regards, Vinayak On Mon, Mar 22, 2021 at 6:06 PM Boxuan Li <liboxuan@...> wrote:
|
|