Date   

Re: [Performance Issue] Large partitions formed on janusgraph_ids table leading to read perf issues (throughput reduces to 1/3rd of original)

sauverma
 

In scylla grafana board, this issue is seen with high number of foreground read tasks.


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

sauverma
 
Edited

Thank you folks for getting back.

@Simone3, yes this issue comes out as read timeout from the shard holding the system_properties table (there is only 1 partition for system_properties unreplicated).

We've used below workarounds to bypass it for now (the code change required in janusgraph is under test right now) based on the observations

- set the gc_grace_seconds for system_properties to 0
- truncate system_properties table periodically (say 2 hours)

Thanks


[Performance Issue] Large partitions formed on janusgraph_ids table leading to read perf issues (throughput reduces to 1/3rd of original)

sauverma
 
Edited

Hi all

We are using janusgraph at zeotap at humongous scale (~70B V and 50B E) backed by scylla.

Right now I am facing an issue in janusgraph_ids table, wherein there are large partitions created in the scylla DB, and this is leading to huge read performance issues. The queries hitting janusgraph_ids table are range queries and with large partitions, the reads are becoming super slow.  

I would like to know if anyone else has observed similar issue, is there a set of configurations that need to be checked or something else you would suggest.

Thanks
Saurabh


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

simone3.cattani@...
 

On Mon, Feb 15, 2021 at 05:59 AM, sauverma wrote:
zeotap
Hi Saurabh,
we are experiencing the exact same issue: spark job with one janusgraph instance per partition calling a scylladb cluster. Our symptom is a ton of timeout exceptions due to missing QUORUM in read queries.

If you want to share your proposed solutions, we will be happy to try them.


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

rngcntr
 

Hi Saurabh!

Thanks for reporting that issue! Looking at the open pull requests, I don't see one that addresses this problem. You're always welcome to share your solutions by discussing them here or even submitting PRs directly. Do you already have these fixes in place and use them in your productive environment or is it rather an early stage draft?


[Performance Optimization] Optimization around the `system_properties` table interaction

sauverma
 
Edited

Hi all

- The interaction with the underlying KV store via janusgraph client hits the `system_properties` table with a range query where the key is `configuration` (key = 0x636f6e66696775726174696f6e)

- The observation is that the janusgraph client stores all the configurations (static + dynamic) is stored against `configuration` key

- When we run the job with spark executors, where each executor is using janusgraph embedded mode, each of these executors create executor level entries (dynamic) with the same key `configuration`

- Thus as the number of executors increase, the particular partition with the key `configuration` starts becoming a large partition, and queries with key=`configuration` become range queries scanning the large partition as seen in below graphs (these are from scylla monitoring grafana dashboard)

- I would like to know if this there is a fix in progress for this issue - we at zeotap are using JanusGraph at tremendous scale (single graphs having 70 billion Vertices and 50 billion edges) and have identified couple of solutions to fix this



Thanks
Saurabh Verma
+91 7976984604


Re: Why does JanusGraph use two versions of netty?

zblumenf@...
 

Ahh.  Actually think I may have answered my own question building from source.  Looks like hadoop:2.7.7 and spark-gremlin:3.4.10 are still using netty 3 and the POM just bumped to the latest netty 3 version for dependency management. Removing netty 3 gets the following enforcer error (similar in a few different modules) 

Dependency convergence error for io.netty:netty:3.9.9.Final paths to dependency are:
+-org.janusgraph:janusgraph-hadoop:0.6.0-SNAPSHOT
  +-org.apache.tinkerpop:spark-gremlin:3.4.10
    +-io.netty:netty:3.9.9.Final
and
+-org.janusgraph:janusgraph-hadoop:0.6.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-client:2.7.7
    +-org.apache.hadoop:hadoop-hdfs:2.7.7
      +-io.netty:netty:3.6.2.Final

I think that makes sense.  


Why does JanusGraph use two versions of netty?

zblumenf@...
 

Noticed that there are two versions of Netty in the POM for JanusGraph: 
 
 
Curious why this is and if it is being used for something specific.  Otherwise can 3 get bumped to 4? 


JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hello,
I'm working on planning another JanusGraph community meetup and wanted to gauge community interest in doing an in-depth focus on tackling OLAP/graph algorithmic work with JanusGraph. This has been covered briefly in previously meetups but I think is worthy of more focus due to the challenges folks face getting JanusGraph/Spark up and running and working performantly. I'm particularly interested in hearing if others have had success with this route in production, and if not, if they've employed other techniques to serve their analytics needs (shortest path, clustering, centrality, data science workflows, etc.). In one case on our side, we had good success deploying a separate, custom C++ in-memory graph alongside JG that serves shortest path requests with a much lower latency than JG and Spark could. Please reach out on this thread or directly to me if you're interested in presenting on this topic or taking part in a panel discussion. I'm currently targeting the March timeframe for the meetup.

Thanks,
Ted


Re: [janusgraph-foundationdb] - New Release

rngcntr
 

Sounds like a good thing to do. We should consider updating our documentation (#54) before that, so that users who get started with the new release don't experience any issues regarding installation and first steps.
But other than that, I strongly support your request to release a new version.


[janusgraph-foundationdb] - New Release

jackson.christopher.lee@...
 

Hi all,

 

I think we should consider doing a release of the janusgraph-foundationdb adapter sometime soon. There have been several changes since the initial 0.1.0 release that warrant a new release. I would like to understand if others agree and also discuss a timeline for when such a release would be done?

 

I've had several people complain that the release is outdated and that its difficult to get started easily since there isn't a pre-built binary that can be downloaded and consumed for people looking to adopt the adapter.

 

Thanks in advance,

Chris Jackson


Re: [DISCUSS] Thread appears as read after a new reply is posted

Oleksandr Porunov
 

I also now find groups.io to be not as good experience as Google Groups.

1) As noted by Marc, there is no way to see on UI that there are threads with new messages you didn't see. Here is the issue for that: https://groups.io/g/GroupManagersForum/topic/30886089
2) We found out that emails with numbers only in prefix (till `@`) are banned in groups.io . I.e. and email 12345<at>example.com cannot subscribe to groups.io

This is disappointing. I will raise those issues to see what options we have

Best regards,
Oleksandr Porunov


Re: [DISCUSS] Thread appears as read after a new reply is posted

Oleksandr Porunov
 

Hi Marc,

> 1. After a new reply appears to a thread, the thread still appears as "read" in the web ui. In other words, you have to open every thread to see whether new replies are present.

That's really strange. I will check try to check if there is a possibility to change this behavior in groups.io


> 2. There is no possibility to start a thread on the web ui. This feels inconsistent because it *is* possible to do a reply in the web ui.

You can go to https://lists.lfaidata.foundation/g/janusgraph-users/ and clock `New Topic` to create a new thread in the web ui but I think you should join the group to do so. I will check if there are sope configurations we can do in groups.io to improve this experience. You should fine `New Topic` button on the left side of the screen as shown below:

If you know any configurations we could do in groups.io to improve this experience, let us know and we will change those configurations. We can also contact LF AI & Data foundation to resolve these issues

Best regards,
Oleksandr Porunov


[DISCUSS] Thread appears as read after a new reply is posted

HadoopMarc <m.c.delignie@...>
 

Hi all,

After using the new mail lists through the web ui for a while, there are two aspects that are particularly annoying:

1. After a new reply appears to a thread, the thread still appears as "read" in the web ui. In other words, you have to open every thread to see whether new replies are present.

2. There is no possibility to start a thread on the web ui. This feels inconsistent because it *is* possible to do a reply in the web ui.

I know, you can get those replies by e-mail, but can anyone advice about a webmail client that has the feel of the old Google groups? I certainly do not want to clutter my usual e-mail account + client with janusgraph posts.

Best wishes,  Marc


Re: [DISCUSS] Monthly Video Calls/Office Hours

Jansen, Jan
 

I think it would be just okay to have a monthly repeating meeting.

Do we also want to use a google document to keep notes?


Re: Option for "reply to sender" is too prominent

Oleksandr Porunov
 

The default option is now `Reply to Group`


Re: [DISCUSS] Monthly Video Calls/Office Hours

Oleksandr Porunov
 

As we are now under LF AI & Data, we can ask for Zoom account for regular JanusGraph meetings.


Option for "reply to sender" is too prominent

Oleksandr Porunov
 

Hi Marc,

It definitely makes sense to prioritize "Group Reply".
I requested John Mertic to change the default option to `Group Reply`.
I'm also voting +1 to remove the option `Reply to Sender`.

Best regards,
Oleksandr Porunov


Option for "reply to sender" is too prominent

Marc de Lignie <m.c.delignie@...>
 

The new mailing list stimulates users to use the option "reply to sender". This means that discussions between OP and expert become invisible to other users. It also means that experts cannot answer from the webpage but have to use their e-mail client to give answers.

I would prefer to remove the option "reply to sender" or at least stop making it the default option. What do you think?

Best wishes,    Marc


Re: mixed index - Reindex is very slow

hadoopmarc@...
 

The janusgraph user forum was moved to:

https://lists.lfaidata.foundation/g/janusgraph-users

You have to fill an e-mail address and acknowledge a request sent to that address.

Regarding the code lines of the mapreduce index:
// Run a JanusGraph-Hadoop job to reindex
mgmt = graph.openManagement()
mr = new MapReduceIndexManagement(graph)
mr.updateIndex(mgmt.getGraphIndex("mixedExample"), SchemaAction.REINDEX).get()

As far as I know this runs on your local machine and all dependencies are present in the JanusGraph distribution. In other words, no need for an hadoop or spark cluster for this. 

Cheers,     Marc