Date
1 - 4 of 4
SimplePath query is slower in 6 node vs 3 node Cassandra cluster
Varun Ganesh <operatio...@...>
Hello,
I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.
I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)
I am running the below query on both of them:
g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()
The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.
I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.
g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(label_A), o... 1 1 4.582 0.39
\_condition=(~label = label_A AND some_id = 123 AND data.name = value1)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]@8000
\_index=someVertexByNameComposite
optimization 0.028
optimization 0.907
backend-query 1 3.012
\_query=someVertexByNameComposite:multiKSQ[1]@8000
\_limit=8000
RepeatStep([JanusGraphVertexStep(BOTH,[... 2 2 1167.493 99.45
HasStep([data.name.eq(... 803.247
JanusGraphVertexStep(BOTH,[... 12934 12934 334.095
\_condition=type[sample_edge]
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
\_multi=true
\_vertices=264
optimization 0.073
backend-query 266 5.640
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
optimization 0.028
backend-query 12689 312.544
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
PathFilterStep(simple) 12441 12441 10.980
JanusGraphMultiQueryStep(RepeatEndStep) 1187 1187 11.825
RepeatEndStep 2 2 810.468
RangeGlobalStep(0,1) 1 1 0.419 0.04
PathStep([value(data.name)]) 1 1 1.474 0.13
>TOTAL - - 1173.969 -
I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.
I realise that you may require more context. Happy to provide more information as required!
Thank you!
Varun Ganesh <operatio...@...>
Just an additional note, you may have noticed that the profile step above shows a time taken of >1000ms. I do not know why this is the case.
When run on the console without profile, it reflects the true time taken:
toggle quoted message
Show quoted text
When run on the console without profile, it reflects the true time taken:
gremlin> clockWithResult(10) { graph.tx().rollback(); g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).next() }
==>130.9545608
Thanks!
Thanks!
On Tuesday, November 24, 2020 at 4:35:22 PM UTC-5 Varun Ganesh wrote:
Hello,I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)I am running the below query on both of them:g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()==>Traversal MetricsStep Count Traversers Time (ms) % Dur=============================================================================================================JanusGraphStep([],[~label.eq(label_A), o... 1 1 4.582 0.39\_condition=(~label = label_A AND some_id = 123 AND data.name = value1)\_orders=[]\_isFitted=true\_isOrdered=true\_query=multiKSQ[1]@8000\_index=someVertexByNameCompositeoptimization 0.028optimization 0.907backend-query 1 3.012\_query=someVertexByNameComposite:multiKSQ[1]@8000\_limit=8000RepeatStep([JanusGraphVertexStep(BOTH,[... 2 2 1167.493 99.45HasStep([data.name.eq(... 803.247JanusGraphVertexStep(BOTH,[... 12934 12934 334.095\_condition=type[sample_edge]\_orders=[]\_isFitted=true\_isOrdered=true\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c\_multi=true\_vertices=264optimization 0.073backend-query 266 5.640\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311coptimization 0.028backend-query 12689 312.544\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311cPathFilterStep(simple) 12441 12441 10.980JanusGraphMultiQueryStep(RepeatEndStep) 1187 1187 11.825RepeatEndStep 2 2 810.468RangeGlobalStep(0,1) 1 1 0.419 0.04PathStep([value(data.name)]) 1 1 1.474 0.13>TOTAL - - 1173.969 -I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.I realise that you may require more context. Happy to provide more information as required!Thank you!
BO XUAN LI <libo...@...>
Hi,
toggle quoted message
Show quoted text
> why the query is 3x slower on 6 nodes
Did you check the hardware differences? Probably the 6-node cluster has slower network, less memory, slower disk, etc.
Another possibility that I can think of is, the data involved in your query is probably distributed across nodes. Since your 3-node cassandra cluster has 3x replication factor, I would presume all data you have is available on every node. Then there would be fewer round-trips happening within the 3-node cluster.
Generally it makes sense to me that the latency of a small cluster is shorter than that of a large cluster, as long as both clusters are not fully loaded. Of course with larger cluster you can achieve higher throughput.
> the profile step above shows a time taken of >1000ms
This can be a bug in profiling. If you can provide a minimal example to reproduce, that would be very helpful.
Best regards,
Boxuan
On Nov 25, 2020, at 6:04 AM, Varun Ganesh <operatio...@...> wrote:Just an additional note, you may have noticed that the profile step above shows a time taken of >1000ms. I do not know why this is the case.
When run on the console without profile, it reflects the true time taken:gremlin> clockWithResult(10) { graph.tx().rollback(); g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).next() }==>130.9545608
Thanks!On Tuesday, November 24, 2020 at 4:35:22 PM UTC-5 Varun Ganesh wrote:Hello,I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)I am running the below query on both of them:g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()==>Traversal MetricsStep Count Traversers Time (ms) % Dur=============================================================================================================JanusGraphStep([],[~label.eq(label_A), o... 1 1 4.582 0.39\_orders=[]\_isFitted=true\_isOrdered=true\_query=multiKSQ[1]@8000\_index=someVertexByNameCompositeoptimization 0.028optimization 0.907backend-query 1 3.012\_query=someVertexByNameComposite:multiKSQ[1]@8000\_limit=8000RepeatStep([JanusGraphVertexStep(BOTH,[... 2 2 1167.493 99.45HasStep([data.name.eq(... 803.247JanusGraphVertexStep(BOTH,[... 12934 12934 334.095\_condition=type[sample_edge]\_orders=[]\_isFitted=true\_isOrdered=true\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c\_multi=true\_vertices=264optimization 0.073backend-query 266 5.640\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311coptimization 0.028backend-query 12689 312.544\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311cPathFilterStep(simple) 12441 12441 10.980JanusGraphMultiQueryStep(RepeatEndStep) 1187 1187 11.825RepeatEndStep 2 2 810.468RangeGlobalStep(0,1) 1 1 0.419 0.04PathStep([value(data.name)]) 1 1 1.474 0.13>TOTAL - - 1173.969 -I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.I realise that you may require more context. Happy to provide more information as required!Thank you!--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6d2483f7-062a-4a95-98b2-6b4aafa87cd3n%40googlegroups.com.
Varun Ganesh <operatio...@...>
Hi Boxuan,
For instance, here's an example of another traversal query where we observe the SAME latency across the 3 and 6 node clusters:
toggle quoted message
Show quoted text
Thank you for getting back to me. Please find my responses below:
> Did you check the hardware differences?
> Did you check the hardware differences?
Yes I can confirm that the two clusters are identical except for the number of nodes.
> the data involved in your query is probably distributed across nodes
This was our initial guess as well. However, if that was the case, we should technically observe this slowness for all the queries that we try. But it is only observed for "path" queries.
For instance, here's an example of another traversal query where we observe the SAME latency across the 3 and 6 node clusters:
g.V().hasLabel('label_B').has('some_id', 123).has('data.name', 1234567).both('sample_edge').valueMap('data.field1', 'data.field2').next(10)
> Then there would be fewer round-trips happening within the 3-node cluster
> Then there would be fewer round-trips happening within the 3-node cluster
I also want to point out that we are not running the Janusgraph in embedded mode (where it is colocated with Cassandra), instead it is running separately on its own server nodes
> Of course with larger cluster you can achieve higher throughput
Interestingly we are not observing any difference in the throughput (i.e. the maximum queries per second that can be handled without seeing timeouts) between the two clusters
Would appreciate any input on where/how we could possibly investigate further.
Thank you!
Varun
On Thursday, November 26, 2020 at 11:19:32 AM UTC-5 li...@... wrote:
Hi,> why the query is 3x slower on 6 nodesDid you check the hardware differences? Probably the 6-node cluster has slower network, less memory, slower disk, etc.Another possibility that I can think of is, the data involved in your query is probably distributed across nodes. Since your 3-node cassandra cluster has 3x replication factor, I would presume all data you have is available on every node. Then there would be fewer round-trips happening within the 3-node cluster.Generally it makes sense to me that the latency of a small cluster is shorter than that of a large cluster, as long as both clusters are not fully loaded. Of course with larger cluster you can achieve higher throughput.> the profile step above shows a time taken of >1000msThis can be a bug in profiling. If you can provide a minimal example to reproduce, that would be very helpful.Best regards,BoxuanOn Nov 25, 2020, at 6:04 AM, Varun Ganesh <oper...@...> wrote:Just an additional note, you may have noticed that the profile step above shows a time taken of >1000ms. I do not know why this is the case.
When run on the console without profile, it reflects the true time taken:gremlin> clockWithResult(10) { graph.tx().rollback(); g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).next() }==>130.9545608
Thanks!On Tuesday, November 24, 2020 at 4:35:22 PM UTC-5 Varun Ganesh wrote:Hello,I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)I am running the below query on both of them:g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()==>Traversal MetricsStep Count Traversers Time (ms) % Dur=============================================================================================================JanusGraphStep([],[~label.eq(label_A), o... 1 1 4.582 0.39\_orders=[]\_isFitted=true\_isOrdered=true\_query=multiKSQ[1]@8000\_index=someVertexByNameCompositeoptimization 0.028optimization 0.907backend-query 1 3.012\_query=someVertexByNameComposite:multiKSQ[1]@8000\_limit=8000RepeatStep([JanusGraphVertexStep(BOTH,[... 2 2 1167.493 99.45HasStep([data.name.eq(... 803.247JanusGraphVertexStep(BOTH,[... 12934 12934 334.095\_condition=type[sample_edge]\_orders=[]\_isFitted=true\_isOrdered=true\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c\_multi=true\_vertices=264optimization 0.073backend-query 266 5.640\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311coptimization 0.028backend-query 12689 312.544\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311cPathFilterStep(simple) 12441 12441 10.980JanusGraphMultiQueryStep(RepeatEndStep) 1187 1187 11.825RepeatEndStep 2 2 810.468RangeGlobalStep(0,1) 1 1 0.419 0.04PathStep([value(data.name)]) 1 1 1.474 0.13>TOTAL - - 1173.969 -I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.I realise that you may require more context. Happy to provide more information as required!Thank you!--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6d2483f7-062a-4a95-98b2-6b4aafa87cd3n%40googlegroups.com.