Re: Aggregating edges based on the source & target vertex attributes

HadoopMarc <bi...@...>

Hi Vishnu,

The processing time does not really surprise me, JanusGraph has to do everything in java. For the typical JanusGraph use case, the storage backend is the limiting factor and the java processing does not really matter. If you want to do this query fast in memory with multiple cores, you are better off with python dask or the like (and do the aggregation on a single dataframe with the edge id, inV label and outV label). I would not be surprised if pandas, using a single core, already does this within a second.

For the queries given above I believe only a single core is used when run as OLTP query. Because this N x N query is not easy to parallelize for TinkerPop, you have to take care how to run it as OLAP query. I would guess that with(SparkGraphComputer) with a single spark executor with 8 cores will work best because then the spark cores share the memory. This is automatically true for spark.master=local[*] .

Best wishes,    Marc

PS Thanks for introducing me into the Indian numbering system. Happily, you do not have 1.5 crore vertices!

Op maandag 21 december 2020 om 09:16:08 UTC+1 schreef vishnu gajendran:

Thank you Kevin and Marc for quick response. I tried both the queries and they are working as expected. My use case demands to run such query for a bigger dataset. I ran the query for 1 lakh vertices and 5 million edges in my desktop using the in-memory backend (assuming that in-memory will be faster compared to other external data stores) and it took roughly 2 minutes to execute. My desktop contains 8 logical cores and 64 GB RAM. Few questions regarding the same:

1. Is this the expected performance for such aggregation queries in JanusGraph?
2. Will increasing the number of cores (i.e. processing power) improve the performance of the query?

The dataset I am dealing with can be as big as 1.5 lakh vertices and 20 million edges and I would like to support the above aggregation query in real time (i.e. in few seconds and not in minutes). Can we achieve the same using JanusGraph?
On Thursday, December 17, 2020 at 7:51:56 PM UTC+5:30 kt...@... wrote:
Thanks for improving it!  Always good to learn more.

On Thu, Dec 17, 2020 at 6:11 AM HadoopMarc <b...@...> wrote:
And here a small variation without the keys and with some code formatting:

        union(select('a').values('organization'), select('b').values('organization')).fold()
==>[marketing, engineering]=2
==>[sales, marketing]=2
==>[engineering, sales]=3
==>[engineering, marketing]=2

Op donderdag 17 december 2020 om 14:50:11 UTC+1 schreef kt...@...:

This may not be optimal, but seems to work:

g.E().hasLabel('collaboration').as('e').outV().values('organization').as('1').select('e').inV().values('organization').as('2').select('e').group().by(select('1', '2')).by(values('collaborationHours').sum()).unfold();

==>{1=engineering, 2=marketing}=2
==>{1=marketing, 2=engineering}=2
==>{1=engineering, 2=sales}=3
==>{1=sales, 2=marketing}=2

Note, you have some leading spaces in your Gremlin on 'collaborationHours' I had to remove, and with the data you provided the engineering/sales total is 3, not 4.


On Wed, Dec 16, 2020 at 11:57 PM vishnu gajendran <gg...@...> wrote:

I request your help regarding the janus graph query which I am trying to construct. Let's consider the following graph where each vertex denotes a person and the edge between any two vertex denotes collaboration between them.

p1 = graph.addVertex('person')'personId', 1)'organization', "engineering")

p2 = graph.addVertex('person')'personId', 2)'organization', "sales")

p3 = graph.addVertex('person')'personId', 3)'organization', "marketing")

p4 = graph.addVertex('person')'personId', 4)'organization', "engineering")

p1.addEdge('collaboration', p2, 'collaborationHours', 1)
p1.addEdge('collaboration', p3, 'collaborationHours', 2)

p2.addEdge('collaboration', p3, 'collaborationHours', 2)

p3.addEdge('collaboration', p4, ' collaborationHours', 2)

p4.addEdge('collaboration', p2, ' collaborationHours', 2)

Expected Result is the following table:

Organization1  Organization2 Total Collaboration Hours
Engineering      Sales                 4
Engineering      Marketing         2
Sales                 Marketing          2
Marketing         Engineering       2

Here, I am trying to aggregate the "person to person" graph into "organization to organization" graph. Does JanusGraph support such aggregation queries? If yes, can you please help me with the query for the same?


You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit

You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....

Join { to automatically receive all group messages.