GroupCount performance


ever...@...
 

I have the following query:
g.V().groupCount().by(T.label)

It does exactly what I want to accomplish. The problem is, label is not indexed and so it takes ages to complete for a larger vertex count. 20k set takes about 5 seconds to process. Larger sets have taken up to 30 minutes without returning. How can I improve on this performance without indexing the label? Note that labels are not currently allowed to be indexed: https://github.com/JanusGraph/janusgraph/issues/283 and indexing a label is not an option for my current design anyway.


Daniel Kuppitz <me@...>
 

An index wouldn't help, it's a full scan anyway. However, you should run that in OLAP; this way you should gain performance through parallelisation.

Cheers,
Daniel


On Wed, Nov 29, 2017 at 9:50 AM, <ever...@...> wrote:
I have the following query:
g.V().groupCount().by(T.label)

It does exactly what I want to accomplish. The problem is, label is not indexed and so it takes ages to complete for a larger vertex count. 20k set takes about 5 seconds to process. Larger sets have taken up to 30 minutes without returning. How can I improve on this performance without indexing the label? Note that labels are not currently allowed to be indexed: https://github.com/JanusGraph/janusgraph/issues/283 and indexing a label is not an option for my current design anyway.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/91454769-1c02-43dc-8e75-ebb16c258eb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


"Everly O." <ever...@...>
 

I have no experience with OLAP - sounds like I have some learning to do. I saw this online, hopefully it's a good starting point: http://docs.janusgraph.org/latest/hadoop-tp3.html. If not, any references would be helpful!


On Wednesday, November 29, 2017 at 4:35:21 PM UTC-6, Daniel Kuppitz wrote:
An index wouldn't help, it's a full scan anyway. However, you should run that in OLAP; this way you should gain performance through parallelisation.

Cheers,
Daniel


On Wed, Nov 29, 2017 at 9:50 AM, <ev...@...> wrote:
I have the following query:
g.V().groupCount().by(T.label)

It does exactly what I want to accomplish. The problem is, label is not indexed and so it takes ages to complete for a larger vertex count. 20k set takes about 5 seconds to process. Larger sets have taken up to 30 minutes without returning. How can I improve on this performance without indexing the label? Note that labels are not currently allowed to be indexed: https://github.com/JanusGraph/janusgraph/issues/283 and indexing a label is not an option for my current design anyway.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/91454769-1c02-43dc-8e75-ebb16c258eb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.