HI, I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?
thanks
|
|
Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?
What portion of overall queries/frequency do these queries represent?
Do you have any other quantitative analysis supporting your theory?
Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.
toggle quoted message
Show quoted text
|
|
Hi Robert, I am using 0.2 pre-release for the testing. My queries are as followed: tx.query().has("customId", Contain.IN, customIds1000) where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices.
I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())
thanks C
toggle quoted message
Show quoted text
On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote: Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?
What portion of overall queries/frequency do these queries represent?
Do you have any other quantitative analysis supporting your theory?
Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.
|
|
Did you mean Math.min() ?
toggle quoted message
Show quoted text
On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <ckp...@...> wrote: Hi Robert, I am using 0.2 pre-release for the testing. My queries are as followed: tx.query().has("customId", Contain.IN, customIds1000) where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices.
I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())
thanks C On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote: Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?
What portion of overall queries/frequency do these queries represent?
Do you have any other quantitative analysis supporting your theory?
Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
correct :). Or we can choose to use LinkedList. I see similar improvement in my test. thanks, C
toggle quoted message
Show quoted text
On Wednesday, October 18, 2017 at 3:32:30 AM UTC-7, Robert Dale wrote: Did you mean Math.min() ?
On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <c...@...> wrote: Hi Robert, I am using 0.2 pre-release for the testing. My queries are as followed: tx.query().has("customId", Contain.IN, customIds1000) where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices.
I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())
thanks C On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote: Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?
What portion of overall queries/frequency do these queries represent?
Do you have any other quantitative analysis supporting your theory?
Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
You should create an issue - https://github.com/JanusGraph/janusgraph/issues - and make a pull request.
toggle quoted message
Show quoted text
On Wednesday, October 18, 2017 at 12:41:46 PM UTC-4, Calvin Lei wrote: correct :). Or we can choose to use LinkedList. I see similar improvement in my test. thanks, C On Wednesday, October 18, 2017 at 3:32:30 AM UTC-7, Robert Dale wrote: Did you mean Math.min() ?
On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <c...@...> wrote: Hi Robert, I am using 0.2 pre-release for the testing. My queries are as followed: tx.query().has("customId", Contain.IN, customIds1000) where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices.
I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())
thanks C On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote: Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?
What portion of overall queries/frequency do these queries represent?
Do you have any other quantitative analysis supporting your theory?
Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
Thanks Robert. I am in the process of getting the CLA signed. Once done I will create a PR. Let me create an issue in the mean time.
toggle quoted message
Show quoted text
On Wednesday, October 18, 2017 at 10:07:37 AM UTC-7, Robert Dale wrote: You should create an issue - https://github.com/JanusGraph/janusgraph/issues - and make a pull request. On Wednesday, October 18, 2017 at 12:41:46 PM UTC-4, Calvin Lei wrote: correct :). Or we can choose to use LinkedList. I see similar improvement in my test. thanks, C On Wednesday, October 18, 2017 at 3:32:30 AM UTC-7, Robert Dale wrote: Did you mean Math.min() ?
On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <c...@...> wrote: Hi Robert, I am using 0.2 pre-release for the testing. My queries are as followed: tx.query().has("customId", Contain.IN, customIds1000) where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices.
I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())
thanks C On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote: Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?
What portion of overall queries/frequency do these queries represent?
Do you have any other quantitative analysis supporting your theory?
Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|