Date   

Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Great! We've done 20 minute slots in the past, that may work well for this if we do around 10-15 minutes presentation, 5-10 for discussion/Q&A? In reality, that'll just scratch the surface but will give folks some jumping off points.

For others, what graph algorithms have you operationalized or would like to? What worked, what didn't? Real world use cases (successes or failures!) are always of keen interest to the group.

--Ted

On Tue, Feb 16, 2021 at 1:35 AM <hadoopmarc@...> wrote:
Hi Ted,

Yes, a short overview of OLAP questions from the user list sounds like a good idea and is easy to prepare. It need not be long; 10 minutes including a few questions for clarifications would do. If you want to discuss these issues in more depth, more time is needed, of course.

Best wishes,       Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

hadoopmarc@...
 

Hi Ted,

Yes, a short overview of OLAP questions from the user list sounds like a good idea and is easy to prepare. It need not be long; 10 minutes including a few questions for clarifications would do. If you want to discuss these issues in more depth, more time is needed, of course.

Best wishes,       Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hey Dylan,
Thanks for the links. That's a promising set of projects. I think a brief survey of OLAP graph engines that may be applicable to JG users would be very interesting. In addition to looking at alternative OLAP engines, I think the question of integration is an interesting one. For example, TP Spark pulls data directly out of JG. I find this attractive from the standpoint of not having to maintain a mirror image of the OLTP graph, but we pay a large performance penalty. Alternatively, a mirror image OLAP graph can be maintained, likely using the same change feed that JG ingests. A third, alternative, that may be feasible using the in-memory storage backend and the darker corners of the JG code base, the FulgoraGraphComputer, could possibly be made to work in a zero-copy fashion. Anyway, not as exciting as the selection/development of the OLAP engine itself, but I think the integration will play a big part in ease of use and adoption.

--Ted

On Fri, Feb 12, 2021 at 4:49 PM Dylan Bethune-Waddell <dylan.bethune.waddell@...> wrote:
Hi Ted,

Great idea Ted. Wanted to mention KatanaGraph (website, github). It's basically a port of this codebase called Galois (website, github). Appears to be a group of UT Austin researchers taking their impressive results (paper) solving various OLAP graph computing problems into open source (3-Clause BSD License). From what I've gathered poking around the new codebase vs. old, and the demo server you can launch a notebook on, they aim to commercialize the distributed GPU aspect of Galois after getting it production ready as katana "enterprise". The guts of it exist in the Galois codebase and they do refer to it - could be a good conversation to have in the JanusGraph community.

Seems like KatanaGraph and cool stuff like rapids.ai spark-rapids are all using the Apache Arrow format, might be an integration to consider. Another interesting project is the GraphBLAS, which is a spec but now has concrete implementations including this one which is from a "competitor" to KatanaGraph, gunrock. IIRC the gunrock direction-optimized BFS code is faster on power-law graphs than the implementation of BFS in katana/galois, which might be Interesting in terms of how Gremlin expects to do it's OLAP traversals.

Best,
Dylan

On Thu, Feb 11, 2021 at 11:51 AM <hadoopmarc@...> wrote:
Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hi Marc,
Yes, I most definitely recognize your nickname and have been a beneficiary of many of your answers, blog posts, etc. Glad to hear you're interested in participating. You've been prolific on the lists and I'm wondering if you have a top 5 olap items that you see people have trouble with over and over? A brief presentation of your responses and pointers to what you've already written would probably be very helpful for folks who are attempting the Spark path.

Thanks,
Ted


On Thu, Feb 11, 2021 at 10:51 AM <hadoopmarc@...> wrote:
Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Dylan Bethune-Waddell
 

Hi Ted,

Great idea Ted. Wanted to mention KatanaGraph (website, github). It's basically a port of this codebase called Galois (website, github). Appears to be a group of UT Austin researchers taking their impressive results (paper) solving various OLAP graph computing problems into open source (3-Clause BSD License). From what I've gathered poking around the new codebase vs. old, and the demo server you can launch a notebook on, they aim to commercialize the distributed GPU aspect of Galois after getting it production ready as katana "enterprise". The guts of it exist in the Galois codebase and they do refer to it - could be a good conversation to have in the JanusGraph community.

Seems like KatanaGraph and cool stuff like rapids.ai spark-rapids are all using the Apache Arrow format, might be an integration to consider. Another interesting project is the GraphBLAS, which is a spec but now has concrete implementations including this one which is from a "competitor" to KatanaGraph, gunrock. IIRC the gunrock direction-optimized BFS code is faster on power-law graphs than the implementation of BFS in katana/galois, which might be Interesting in terms of how Gremlin expects to do it's OLAP traversals.

Best,
Dylan

On Thu, Feb 11, 2021 at 11:51 AM <hadoopmarc@...> wrote:
Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

hadoopmarc@...
 

Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc


Re: Inconsistent composite index status after transaction failure

hadoopmarc@...
 

The janusgraph docs and gitrepo talk about ghost vertices or phantom vertices:

https://docs.janusgraph.org/advanced-topics/eventual-consistency/#ghost-vertices
https://docs.janusgraph.org/basics/transactions/#transaction-configuration
https://github.com/JanusGraph/janusgraph/issues/2176

What happens if you try to delete the vertex with my_id = 2?

If this works, I am afraid there is not a more helpful answer than:
https://lists.lfaidata.foundation/g/janusgraph-users/topic/79936778#3898

If deleting the vertex does not work, you might want to mention this discussion in issue 2176, explaining why your issue is different from a ghost vertex.

Best wishes,    Marc


JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hello,
I'm working on planning another JanusGraph community meetup and wanted to gauge community interest in doing an in-depth focus on tackling OLAP/graph algorithmic work with JanusGraph. This has been covered briefly in previously meetups but I think is worthy of more focus due to the challenges folks face getting JanusGraph/Spark up and running and working performantly. I'm particularly interested in hearing if others have had success with this route in production, and if not, if they've employed other techniques to serve their analytics needs (shortest path, clustering, centrality, data science workflows, etc.). In one case on our side, we had good success deploying a separate, custom C++ in-memory graph alongside JG that serves shortest path requests with a much lower latency than JG and Spark could. Please reach out on this thread or directly to me if you're interested in presenting on this topic or taking part in a panel discussion. I'm currently targeting the March timeframe for the meetup.

Thanks,
Ted


Re: JanusGraph 0.5.3 using Gremlin Server 3.4.10

hadoopmarc@...
 

Also check:

https://github.com/JanusGraph/janusgraph/commit/7fbf6567f259b2514e64e7a70e668e7975b38a72

This suggests that no new library conflicts occur.

Marc


Re: JanusGraph 0.5.3 using Gremlin Server 3.4.10

hadoopmarc@...
 

Hi James,

Best approach is to do a custom build of janusgraph with the version of TinkerPop changed (you can use -D skipTests because TInkerPop API have not changed). With the maven dependency plugin it is easy to find newly introduced library version conflicts, which may have to be solved manually by exluding unwanted versions. It will require some study of the entire project becuase JanusGraph has an extensive hierarchy of modules, each with its own pom.xml. It is also a kind of magic, but with only a few version conflicts the number of permutations is small ...

Best wishes,    Marc


Inconsistent composite index status after transaction failure

simone3.cattani@...
 

I have nodes with two properties: my_id (long) and package_name (string).
I have defined two composite indexes
* on my_id
* on my_id + package_name (unique)

I'm using JanusGraph 0.5.2 with CQL (actually ScyllaDB), no index-backend configured.

Now I have a situation where, considering a pair (my_id = 2, package_name = foo):
* searching for g.V().has('my_id', 2) produces an empty result
* searching for g.V().has('my_id', 2).has('package_name', 'foo') returns on node, actually a node that doesn't really exists

My hypothesis is: during the write transaction, some failure has occurred and somehow the status of the second index was not cleaned up.

Can I clean it up manually removing the index entry?


JanusGraph 0.5.3 using Gremlin Server 3.4.10

james.jstroud@...
 

Hi, My team is using JanusGraph version 0.5.3.  I want my team to use Gremlin Server 3.4.10 as opposed to 3.4.6 that comes bundled with JanusGraph 0.5.3
I suppose it would work based upon this (and reading the change logs)

I searched the forums and could not see if anyone was using this combination.

I just want to check if any JanusGraph users of version 0.5.3 are using Apache Tinkertop Gremlin Server 3.4.10


If anyone is can you just let me know as that will help.  Thanks
James Stroud


Re: Janusgraph connect with MySQL Storage Backend

rngcntr
 

Oh nice, I wrote the janusbench tool for myself and didn't expect anyone discover it or even use it. Great to see and thanks for advertising it @Boxuan! ;)


Re: Coalesce() step behaves differently than Or() step, with exception in sum() step

hadoopmarc@...
 

This question is also asked to the gremlin user group, where it can be best answered:

https://groups.google.com/g/gremlin-users/c/-oBWRUxF_Hw


Re: Janusgraph connect with MySQL Storage Backend

Madhan Neethiraj
 

Boxuan,

 

Thanks for the pointer to janusbench. I will try this and update this thread in few days.

 

Madhan

 

 

From: <janusgraph-users@...> on behalf of BO XUAN LI <liboxuan@...>
Reply-To: <janusgraph-users@...>
Date: Monday, February 8, 2021 at 2:46 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend

 

Hi Madhan,

 

Have you checked out https://github.com/rngcntr/janusbench ? I never used it personally but it looks interesting and might be helpful.

 

Best regards,

Boxuan



On Feb 8, 2021, at 4:04 PM, Madhan Neethiraj <madhan@...> wrote:

 

Hi Marc,

 

Thanks! This is a work-in-progress implementation, and will need to go through more testing - especially to understand the performance and tuning aspects. Once these are done, this feature can be announced via a blog and/or other channels.

 

Are there JanusGraph tests available to cover performance aspects of backend storage implementations? It will be of big help.

 

Thanks,

Madhan

 

From: <janusgraph-users@...> on behalf of <hadoopmarc@...>
Reply-To: <janusgraph-users@...>
Date: Sunday, February 7, 2021 at 3:10 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend

 

Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc 

 


Coalesce() step behaves differently than Or() step, with exception in sum() step

cmilowka
 

I am trying to replace coalesce() by or() which is generally faster, but there is a problem with or() step failing in the following mum() step:
 
gremlin> graph3 = TinkerGraph.open()                                              ==>tinkergraph[vertices:0 edges:0]
gremlin> graph3.io(graphml()).readGraph('data/grateful-dead.xml')  ==>null
gremlin> g3 = graph3.traversal()                                                        ==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> g3.V('89').values("performances")                                       ==>219
gremlin> g3.V('89').values("performances").sum()                             ==>219
gremlin> g3.V('89').or(values("performances")).sum()                       ==> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Number
gremlin> g3.V('89').or(values("performances"),__.constant("10000")).sum()  ==> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Number
gremlin> g3.V('89').coalesce(values("performances"),__.constant("10000")).sum()  ==>219
 
Is an error, or wrong doing?
CM
 


Re: Where does Computation happens

hadoopmarc@...
 

Hi Dany,

Some anwers:
  1. computation happens inside janusgraph (not the storage backend) and janusgraph runs as part of gremlin server
  2. yes, a query and its computations run on a single gremlin server instance
  3. None. If you run a g.V().count(), that is a full table scan, gremlin server will not run out of memory, but if you have billions of vertices the query will take days. For the kinds of workloads that you worry about, JanusGraph has some initial support for OLAP type operations in which all data are loaded by a spark cluster and computation results are returned to the janusgraph instance.
Best wishes,   Marc


Where does Computation happens

Dany <danyinb@...>
 

Hi Team,

I have a clarification on a distributed query execution, Have JanusGraph setup with cassandra distributed storage. I am worried about the performance of complex queries.

1. Where does my query computation happen? Is it in the JanusGraph Gremlin server or on the distributed storage?
2. If the execution is on a gremlin server, does all the computations happen on a single gremlin instance or distributed query?
3. What is the max traversal of vertices and edges that can be handled by janusgraph server? eg. counts?

--
Regards,

Dany


Re: Janusgraph connect with MySQL Storage Backend

Boxuan Li
 

Hi Madhan,

Have you checked out https://github.com/rngcntr/janusbench ? I never used it personally but it looks interesting and might be helpful.

Best regards,
Boxuan

On Feb 8, 2021, at 4:04 PM, Madhan Neethiraj <madhan@...> wrote:

Hi Marc,
 
Thanks! This is a work-in-progress implementation, and will need to go through more testing - especially to understand the performance and tuning aspects. Once these are done, this feature can be announced via a blog and/or other channels.
 
Are there JanusGraph tests available to cover performance aspects of backend storage implementations? It will be of big help.
 
Thanks,
Madhan
 
From: <janusgraph-users@...> on behalf of <hadoopmarc@...>
Reply-To: <janusgraph-users@...>
Date: Sunday, February 7, 2021 at 3:10 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend
 
Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc 



Re: Janusgraph connect with MySQL Storage Backend

Madhan Neethiraj
 

Hi Marc,

 

Thanks! This is a work-in-progress implementation, and will need to go through more testing - especially to understand the performance and tuning aspects. Once these are done, this feature can be announced via a blog and/or other channels.

 

Are there JanusGraph tests available to cover performance aspects of backend storage implementations? It will be of big help.

 

Thanks,

Madhan

 

From: <janusgraph-users@...> on behalf of <hadoopmarc@...>
Reply-To: <janusgraph-users@...>
Date: Sunday, February 7, 2021 at 3:10 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend

 

Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc