Small initial size of place holder for query results in MultiKeySliceQuery.java


Calvin Lei <ckp...@...>
 

HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 


Robert Dale <rob...@...>
 

Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?

What portion of overall queries/frequency do these queries represent?

Do you have any other quantitative analysis supporting your theory?

Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.



Robert Dale

On Mon, Oct 16, 2017 at 7:57 PM, Calvin Lei <ckp...@...> wrote:
HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/4cac10c4-e001-4789-bb1e-c5b3f995be85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Calvin Lei <ckp...@...>
 

Hi Robert,
   I am using 0.2 pre-release for the testing. My queries are as followed:
          tx.query().has("customId", Contain.IN, customIds1000)
   where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String
  
   As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices. 

   I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())

thanks
C


On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote:
Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?

What portion of overall queries/frequency do these queries represent?

Do you have any other quantitative analysis supporting your theory?

Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.



Robert Dale

On Mon, Oct 16, 2017 at 7:57 PM, Calvin Lei <c...@...> wrote:
HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/4cac10c4-e001-4789-bb1e-c5b3f995be85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Robert Dale <rob...@...>
 

Did you mean Math.min() ?

Robert Dale

On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <ckp...@...> wrote:
Hi Robert,
   I am using 0.2 pre-release for the testing. My queries are as followed:
          tx.query().has("customId", Contain.IN, customIds1000)
   where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String
  
   As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices. 

   I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())

thanks
C

On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote:
Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?

What portion of overall queries/frequency do these queries represent?

Do you have any other quantitative analysis supporting your theory?

Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.



Robert Dale

On Mon, Oct 16, 2017 at 7:57 PM, Calvin Lei <c...@...> wrote:
HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/4cac10c4-e001-4789-bb1e-c5b3f995be85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Calvin Lei <ckp...@...>
 

correct :). Or we can choose to use LinkedList. I see similar improvement in my test. 

thanks,
C


On Wednesday, October 18, 2017 at 3:32:30 AM UTC-7, Robert Dale wrote:
Did you mean Math.min() ?

Robert Dale

On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <c...@...> wrote:
Hi Robert,
   I am using 0.2 pre-release for the testing. My queries are as followed:
          tx.query().has("customId", Contain.IN, customIds1000)
   where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String
  
   As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices. 

   I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())

thanks
C

On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote:
Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?

What portion of overall queries/frequency do these queries represent?

Do you have any other quantitative analysis supporting your theory?

Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.



Robert Dale

On Mon, Oct 16, 2017 at 7:57 PM, Calvin Lei <c...@...> wrote:
HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/4cac10c4-e001-4789-bb1e-c5b3f995be85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Robert Dale <rob...@...>
 

You should create an issue - https://github.com/JanusGraph/janusgraph/issues - and make a pull request.


On Wednesday, October 18, 2017 at 12:41:46 PM UTC-4, Calvin Lei wrote:
correct :). Or we can choose to use LinkedList. I see similar improvement in my test. 

thanks,
C

On Wednesday, October 18, 2017 at 3:32:30 AM UTC-7, Robert Dale wrote:
Did you mean Math.min() ?

Robert Dale

On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <c...@...> wrote:
Hi Robert,
   I am using 0.2 pre-release for the testing. My queries are as followed:
          tx.query().has("customId", Contain.IN, customIds1000)
   where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String
  
   As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices. 

   I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())

thanks
C

On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote:
Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?

What portion of overall queries/frequency do these queries represent?

Do you have any other quantitative analysis supporting your theory?

Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.



Robert Dale

On Mon, Oct 16, 2017 at 7:57 PM, Calvin Lei <c...@...> wrote:
HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/4cac10c4-e001-4789-bb1e-c5b3f995be85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Calvin Lei <ckp...@...>
 

Thanks Robert. I am in the process of getting the CLA signed. Once done I will create a PR. Let me create an issue in the mean time.


On Wednesday, October 18, 2017 at 10:07:37 AM UTC-7, Robert Dale wrote:
You should create an issue - https://github.com/JanusGraph/janusgraph/issues - and make a pull request.

On Wednesday, October 18, 2017 at 12:41:46 PM UTC-4, Calvin Lei wrote:
correct :). Or we can choose to use LinkedList. I see similar improvement in my test. 

thanks,
C

On Wednesday, October 18, 2017 at 3:32:30 AM UTC-7, Robert Dale wrote:
Did you mean Math.min() ?

Robert Dale

On Tue, Oct 17, 2017 at 12:50 PM, Calvin Lei <c...@...> wrote:
Hi Robert,
   I am using 0.2 pre-release for the testing. My queries are as followed:
          tx.query().has("customId", Contain.IN, customIds1000)
   where I have different queries with lookup of 500, 1000, 5000, 100000, 50000 IDs of type String
  
   As JanusGraph do not support non-numeric IDs, this is our main use case for the ID indirection to map the string ID to JanusGraph's vertices. 

   I ran the queries with and without the change, I see the performance boost of roughly 30% in query time. I agree that the result size is not necessarily the same as the query size, but in best (worst?) case, the results would be the same as the query size. Regarding the hard limit, it is a very good point and we can address it by simply using Math.max(getLimit(), queries.size())

thanks
C

On Tuesday, October 17, 2017 at 4:50:25 AM UTC-7, Robert Dale wrote:
Can you share the queries and/or profile() (only useful from v0.2) that show multiKSQ[x] where x is the size of the queries?

What portion of overall queries/frequency do these queries represent?

Do you have any other quantitative analysis supporting your theory?

Looking at the loop, the result size is not necessarily the same as the query size. It is limited by the aggregate of the sub keySliceQuery sizes and a hard limit. I'm not familiar enough with the internals to understand how these are affected or controlled. However, in my own queries, only ones with within() predicates seem to affect the query size. And these are rarely used.



Robert Dale

On Mon, Oct 16, 2017 at 7:57 PM, Calvin Lei <c...@...> wrote:
HI, 
   I am curious why the result place holder is set to size of 4 in MultiKeySliceQuery.java:49? If the incoming query is of size 1000 with corresponding 1000 results. There will be unnecessary resizing taking place. Wouldn't it be fair to set the initial capacity of the array list to be queries.size()?

thanks
 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/4cac10c4-e001-4789-bb1e-c5b3f995be85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/5700a0d4-222a-4440-8f37-de121ac9e78a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.