Re: The count query based on the vertex traversal edges is too slow!!!


"alex...@gmail.com" <alexand...@...>
 

Hi,

`count` step doesn't use mixed index right now. There is a WIP PR which will allow to use mixed index for count step: https://github.com/JanusGraph/janusgraph/pull/2200
Right now, the best you can do is using direct indexQuery count. See the comment here on how to use indexQuery to speedup count: https://github.com/JanusGraph/janusgraph/issues/926#issuecomment-401442381

range(low,high) - is a deep pagination problem. It always searches from 0 to `high` but you can change your logic to have a workaround for deep pagination. Here is a comment where I discuss the workaround: https://github.com/JanusGraph/janusgraph/issues/986#issuecomment-451601715
You can read more about deep pagination problem here: https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

Best regards,
Oleksandr

On Tuesday, October 27, 2020 at 1:53:03 PM UTC+2 HadoopMarc wrote:
Hi

You have a lot of perseverance and you are welcome with that! This is an open source community, so use this perseverance to find a solution for us all.

Some resources on janusgraph performance:

Best wishes,    Marc

Op dinsdag 27 oktober 2020 om 10:57:04 UTC+1 schreef wan...@...:
Is there no other way to speed up the page turning and querying the last few pages?
And I still have a requirement that many edges may be loaded. At present, the speed of loading is not ideal.

在2020年10月27日星期二 UTC+8 下午3:17:05<HadoopMarc> 写道:
Hi,

As explained earlier, the range(30000, 30010) causes a table scan starting at result 0. There is no way to circumvent this using range.

As to the many targetIds during development, you can do:

targetIds = g.V().hasLabel('InstanceMetric').has('type', neq('network)).has('vlabel', 'InstanceMetric').id().limit(10).toList()

HTH,     Marc

Op dinsdag 27 oktober 2020 om 04:42:05 UTC+1 schreef wan...@...:
I did a test in the terminal environment and it can indeed speed up the query. But in real development, I can’t use this method, because this method will load a lot of vertex ids and increase additional network overhead.

In addition, is there any way to speed up the query of the last few pages of data in the paging query?

在2020年10月26日星期一 UTC+8 下午6:27:36<HadoopMarc> 写道:
Hi,

You are right, when the index is not used in the outV() step, Janugraph resorts to a full table scan until it has enough results, 10 in the first case and 30.010 in the second. Can you also try my other suggestion to first get the targetIds and use these in you main query? My hope is that the inE() step is sufficiently fast. The edges returned from inE() already contain the vertex id's that can be matched locally against targetIds.

Marc

Op maandag 26 oktober 2020 om 08:21:32 UTC+1 schreef wan...@...:

Hi,
Please help me look at the following question
在2020年10月26日星期一 UTC+8 下午2:49:43<HadoopMarc> 写道:
Hi,

The first line in the code suggestion in my previous post should have been (added id() step):

targetIds = g.V().hasLabel('InstanceMetric').has('type', neq('network)).has('vlabel', 'InstanceMetric').id().toList()

Best wishes,    Marc

Op zondag 25 oktober 2020 om 16:57:58 UTC+1 schreef HadoopMarc:
Hi,

Apparently, the query planner is not able to use the index for the outV() step. Can you see what happens if we split the query like this (not tested):

targetIds = g.V().hasLabel('InstanceMetric').has('type', neq('network)).has('vlabel', 'InstanceMetric').toList()

g.V().hasLabel('InstanceMetric').has('type', neq('network)).has('vlabel', 'InstanceMetric')
    .inE('Cause').has('status', -1).has('isManual', false)
        .has('promote', within(-1,0,2,3)).has('vlabel', 'Cause')
        .where(outV().has(id, within(targetIds)))

Note that you can use the where() step instead of the as/select construct, just for readability.

HTH,   Marc

Op zaterdag 24 oktober 2020 om 15:19:12 UTC+2 schreef wan...@...:
The total number of sides is 15000

The edge data meets the query condition is only 10000 in total

在2020年10月24日星期六 UTC+8 下午9:18:48<wd w> 写道:
profile.png

在2020年10月20日星期二 UTC+8 下午10:38:46<HadoopMarc> 写道:
Can you show the profiling of the query using the profile() step?

Best wishes,    Marc

Op dinsdag 20 oktober 2020 om 14:22:59 UTC+2 schreef wan...@...:

g.V().hasLabel("Instance").has("instanceId", P.within("12", "34")).bothE('Cause').has('enabled', true).as('e').bothV().has('instanceId', P.within('64', '123')).select('e').count();

The above count query executes very slowly, what method can be used to speed up its query.

I have created compositeIndex and  mixedIndex for instanceId, enabled.

How should I convert this query to a direct index query!

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.