Union Query Optimization


AMIYA KUMAR SAHOO
 

Hi Vinayak,

I am not sure how to improve this query further through gremlin.

Query can be faster through data model.  VCI will be helpful, if you are applying any other filter along with hasLabel and  your edge selectivity is low compare  to the total degree of those vertex. 

If this query is very frequent and there's a need to improve it further, you can make inV title property be part of edge and VCI can be enabled on that edge property.

Other than that not sure if any configuration can be done to further improve. Someone else might comment on this front.


Regards,
Amiya


On Thu, 22 Apr 2021, 19:08 Vinayak Bali, <vinayakbali16@...> wrote:
Hi Amiya, 

Thank you for the query. It also increased the performance. But, it's still 35 seconds. Is there any other way to optimize it further, there are only 10 records returned by the query. 
Counts are as follows: 
V1: 187K V2:40 V3: 50 V4: 447K 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 12:55 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

You can try below query, it can use index and combine as many traversals you want.

g.inject(1).
   union ( 
      V().has('title', 'V1'). outE().hasLabel('E1').inV().has('title', 'V2'),
    
       V().has('title', 'V3'). outE().hasLabel('E3').inV().has('title', 'V4'))....

Regards,
Amiya



On Thu, 22 Apr 2021, 10:36 Vinayak Bali, <vinayakbali16@...> wrote:
Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Vinayak Bali
 

Hi Amiya, 

Thank you for the query. It also increased the performance. But, it's still 35 seconds. Is there any other way to optimize it further, there are only 10 records returned by the query. 
Counts are as follows: 
V1: 187K V2:40 V3: 50 V4: 447K 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 12:55 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

You can try below query, it can use index and combine as many traversals you want.

g.inject(1).
   union ( 
      V().has('title', 'V1'). outE().hasLabel('E1').inV().has('title', 'V2'),
    
       V().has('title', 'V3'). outE().hasLabel('E3').inV().has('title', 'V4'))....

Regards,
Amiya



On Thu, 22 Apr 2021, 10:36 Vinayak Bali, <vinayakbali16@...> wrote:
Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


AMIYA KUMAR SAHOO
 

Hi Vinayak,

You can try below query, it can use index and combine as many traversals you want.

g.inject(1).
   union ( 
      V().has('title', 'V1'). outE().hasLabel('E1').inV().has('title', 'V2'),
    
       V().has('title', 'V3'). outE().hasLabel('E3').inV().has('title', 'V4'))....

Regards,
Amiya



On Thu, 22 Apr 2021, 10:36 Vinayak Bali, <vinayakbali16@...> wrote:
Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Vinayak Bali
 

Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


cmilowka
 

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Vinayak Bali
 

Hi All, 

I need to select multiple nodes and edges and display the content in v1 - e - v2 format. The query generated is as follows:

g.V().union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

It throws the warning:
05:20:21 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

How can we optimize the query may be without a union step?

Thanks & Regards,
Vinayak