Create index on property to speed up groupCount().by()


Yingjie Li <ying...@...>
 

Thanks Marc for the pointers.


On Sunday, September 13, 2020 at 7:27:14 AM UTC-4 HadoopMarc wrote:
Hi Yingjie,

Thanks for posting back your results. I am afraid I do not see an easy way out. Your issue is related to counting label types:


Using OLAP can speed up your query by parallelizing the retrieval of vertices due to the original gremlin query, but only if your storage backend can handle the load.

More speculatively (never tried this myself), you might try to bypass JanusGraph and access the indexing backend directly to get approximate group counts of the type values:


Best wishes,    Marc

Op zondag 13 september 2020 om 12:10:53 UTC+2 schreef Yingjie Li:
Hello Marc,

Thanks for the pointer. I created the mixedIndex (mbyType),and tried with 

 graph.indexQuery("mbyType","v.type:(typeA, typeB, typeC)").vertexTotals()

It does return total vertexes of those types that are given. Though this is different from what I would like to have, e.g., retrieve list of types (not known beforehand) together with their counts for the vertexes.

any suggestions?

Thanks

On Friday, September 11, 2020 at 1:55:13 AM UTC-4 HadoopMarc wrote:
Hi Yingjie,

Unfortunately, your query requires all vertices to be retrieved from the backend. The index is only used to select the vertices that need to be retrieved, so this does not help in your case.

There is a way out, though.  Once you have defined a MixedIndex, take a look at:


HTH,     Marc

Op donderdag 10 september 2020 om 18:09:19 UTC+2 schreef Yingjie Li:
Hello, 
What kind of index we can create  on a user defined  property , say' type' to speed up 
g.V().groupCount().by('type')

I have created a Composite index  for. 'type' and made sure it is in 'ENABLED' status
type=mgmt.getPropertyKey("type")
mgmt.buildIndex('byType',Vertex.class).addKey(type).buildCompositeIndex()

After that, a query like g.V().has('type', '..').count() runs very fast, but g.V().groupCount('type') still runs very slow, and the index does not have any impact. 

The MixedIndex seems won't help either, 


HadoopMarc <bi...@...>
 

Hi Yingjie,

Thanks for posting back your results. I am afraid I do not see an easy way out. Your issue is related to counting label types:

https://github.com/JanusGraph/janusgraph/issues/926

Using OLAP can speed up your query by parallelizing the retrieval of vertices due to the original gremlin query, but only if your storage backend can handle the load.

More speculatively (never tried this myself), you might try to bypass JanusGraph and access the indexing backend directly to get approximate group counts of the type values:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-bucket-terms-aggregation.html

Best wishes,    Marc

Op zondag 13 september 2020 om 12:10:53 UTC+2 schreef Yingjie Li:

Hello Marc,

Thanks for the pointer. I created the mixedIndex (mbyType),and tried with 

 graph.indexQuery("mbyType","v.type:(typeA, typeB, typeC)").vertexTotals()

It does return total vertexes of those types that are given. Though this is different from what I would like to have, e.g., retrieve list of types (not known beforehand) together with their counts for the vertexes.

any suggestions?

Thanks

On Friday, September 11, 2020 at 1:55:13 AM UTC-4 HadoopMarc wrote:
Hi Yingjie,

Unfortunately, your query requires all vertices to be retrieved from the backend. The index is only used to select the vertices that need to be retrieved, so this does not help in your case.

There is a way out, though.  Once you have defined a MixedIndex, take a look at:


HTH,     Marc

Op donderdag 10 september 2020 om 18:09:19 UTC+2 schreef Yingjie Li:
Hello, 
What kind of index we can create  on a user defined  property , say' type' to speed up 
g.V().groupCount().by('type')

I have created a Composite index  for. 'type' and made sure it is in 'ENABLED' status
type=mgmt.getPropertyKey("type")
mgmt.buildIndex('byType',Vertex.class).addKey(type).buildCompositeIndex()

After that, a query like g.V().has('type', '..').count() runs very fast, but g.V().groupCount('type') still runs very slow, and the index does not have any impact. 

The MixedIndex seems won't help either, 


Yingjie Li <ying...@...>
 

Hello Marc,

Thanks for the pointer. I created the mixedIndex (mbyType),and tried with 

 graph.indexQuery("mbyType","v.type:(typeA, typeB, typeC)").vertexTotals()

It does return total vertexes of those types that are given. Though this is different from what I would like to have, e.g., retrieve list of types (not known beforehand) together with their counts for the vertexes.

any suggestions?

Thanks

On Friday, September 11, 2020 at 1:55:13 AM UTC-4 HadoopMarc wrote:
Hi Yingjie,

Unfortunately, your query requires all vertices to be retrieved from the backend. The index is only used to select the vertices that need to be retrieved, so this does not help in your case.

There is a way out, though.  Once you have defined a MixedIndex, take a look at:


HTH,     Marc

Op donderdag 10 september 2020 om 18:09:19 UTC+2 schreef Yingjie Li:
Hello, 
What kind of index we can create  on a user defined  property , say' type' to speed up 
g.V().groupCount().by('type')

I have created a Composite index  for. 'type' and made sure it is in 'ENABLED' status
type=mgmt.getPropertyKey("type")
mgmt.buildIndex('byType',Vertex.class).addKey(type).buildCompositeIndex()

After that, a query like g.V().has('type', '..').count() runs very fast, but g.V().groupCount('type') still runs very slow, and the index does not have any impact. 

The MixedIndex seems won't help either, 


HadoopMarc <bi...@...>
 

Hi Yingjie,

Unfortunately, your query requires all vertices to be retrieved from the backend. The index is only used to select the vertices that need to be retrieved, so this does not help in your case.

There is a way out, though.  Once you have defined a MixedIndex, take a look at:

https://docs.janusgraph.org/index-backend/direct-index-query/#query-totals

HTH,     Marc

Op donderdag 10 september 2020 om 18:09:19 UTC+2 schreef Yingjie Li:

Hello, 
What kind of index we can create  on a user defined  property , say' type' to speed up 
g.V().groupCount().by('type')

I have created a Composite index  for. 'type' and made sure it is in 'ENABLED' status
type=mgmt.getPropertyKey("type")
mgmt.buildIndex('byType',Vertex.class).addKey(type).buildCompositeIndex()

After that, a query like g.V().has('type', '..').count() runs very fast, but g.V().groupCount('type') still runs very slow, and the index does not have any impact. 

The MixedIndex seems won't help either, 


Yingjie Li <ying...@...>
 

Hello, 
What kind of index we can create  on a user defined  property , say' type' to speed up 
g.V().groupCount().by('type')

I have created a Composite index  for. 'type' and made sure it is in 'ENABLED' status
type=mgmt.getPropertyKey("type")
mgmt.buildIndex('byType',Vertex.class).addKey(type).buildCompositeIndex()

After that, a query like g.V().has('type', '..').count() runs very fast, but g.V().groupCount('type') still runs very slow, and the index does not have any impact. 

The MixedIndex seems won't help either,