Date   

Re: [DISCUSS] JanusGraph 0.6.0 release

rngcntr
 

If we wait for TinkerPop 3.5.0 (or 3.4.11), we can probably also release the configurable batch sizes in 0.6.0 which should bring a noticable performance gain if used properly.

Kind regards, Florian


Re: [DISCUSS] JanusGraph 0.6.0 release

Jansen, Jan
 

Hi

I would be great to release JanusGraph 0.6.0.

I just want to mention that TinkerPop 3.5.0 is planed to be released in April.

We don't release new version in frequent time span, so should we wait to release or not?

Best regards, Jan


Re: [DISCUSS] JanusGraph 0.6.0 release

Boxuan Li
 

Thanks for organizing this, Oleksandr! I would like to have Fix potential ThreadLocal transaction leak released in 0.6.0. If there is no further comment I plan to merge it this weekend.

Graph Reindexing Issue Fix: this PR is actually WIP. If I have some time at the weekend, I'll try to see if I can help fix it.

Best regards,
Boxuan Li


[DISCUSS] JanusGraph 0.6.0 release

Oleksandr Porunov
 

Hi everyone,

I would like to start a discussion about JanusGraph 0.6.0 release. There were multiple requests from the community about ETA of 0.6.0 release.
I wanted to check if there are any features / bugfixes which we want to include into 0.6.0 release?
Right now we have the next PRs / issues targeted 0.6.0 release:
Reduce configuration options for CFG - will check soon.
Bump bigtable-hbase-2.x-shaded from 1.16.0 to 1.19.0 - this is just a version bump, but I didn't test JanusGraph compatibility with Bigtable.
Add exprimental support for java 11 - This PR adds experimental support for Java 11, but looks like many tests won't work with Java 11. So it's hard to say if JanusGraph will be compatible with Java 11 when used with Hbase for example or different settings. Also, it's hard to say about relation between TinkerPop and JanusGraph regarding this matter. TinkerPop added Java 11 support to 3.5.0 version, but this version isn't released yet. I guess this might influence some TinkerPop tests and functionality, but I didn't check that yet.
Graph Reindexing Issue Fix - this PR is still under review.
Add job to run all TinkerPop tests on schedule - I think this issue isn't hard to be resolved, so I think we can resolve it. Nevertheless it shouldn't block 0.6.0 release anyhow because we run TinkerPop tests before the release.

Are there any other work we need to add to 0.6.0 version?
From my point of view, I think we can release 0.6.0 version after the above issues / PRs are resolved or re-targeted. That said, if I miss something, please, add relevant issues / PRs to the milestone or discuss it in this thread.

Best regards,
Oleksandr Porunov


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Hi Marc

I've submitted this as a bug https://github.com/JanusGraph/janusgraph/issues/2515

Thanks


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

hadoopmarc@...
 


Re: [Performance Issue] Large partitions formed on janusgraph_ids table leading to read perf issues (throughput reduces to 1/3rd of original)

hadoopmarc@...
 


Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Hi JG community

We are facing a weird issue with Replication factor 3 with scylla DB as backend.  

- we are using index for looking up V in the graph
- observation is that for many vertices with the same index value, multiple vertex IDs are generated as seen below


Has anyone else encountered the same issue with RF3, with RF1 the issue is not encountered

Thanks


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

sauverma
 

On Mon, Feb 22, 2021 at 01:58 AM, <simone3.cattani@...> wrote:
columns
Hi @simone

We were planning to segregate the static configurations from the runtime dynamic configurations (last update TS from client, etc.).
AFAIK only the static configurations are required by the janusgraph clients while initializing.

Thanks
Saurabh


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

simone3.cattani@...
 

HI @sauverma,
nice, we are truncating the table, too. It's good to have your confirmation that it could safely workaround the issue.
We will analyze the `gc_grace_seconds` setting now.

Just for curiosity: are you fixing it changing the `KCVSConfiguration` in order to store properties as rows instead of columns? 


Re: [Performance Issue] Large partitions formed on janusgraph_ids table leading to read perf issues (throughput reduces to 1/3rd of original)

sauverma
 

In scylla grafana board, this issue is seen with high number of foreground read tasks.


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

sauverma
 
Edited

Thank you folks for getting back.

@Simone3, yes this issue comes out as read timeout from the shard holding the system_properties table (there is only 1 partition for system_properties unreplicated).

We've used below workarounds to bypass it for now (the code change required in janusgraph is under test right now) based on the observations

- set the gc_grace_seconds for system_properties to 0
- truncate system_properties table periodically (say 2 hours)

Thanks


[Performance Issue] Large partitions formed on janusgraph_ids table leading to read perf issues (throughput reduces to 1/3rd of original)

sauverma
 
Edited

Hi all

We are using janusgraph at zeotap at humongous scale (~70B V and 50B E) backed by scylla.

Right now I am facing an issue in janusgraph_ids table, wherein there are large partitions created in the scylla DB, and this is leading to huge read performance issues. The queries hitting janusgraph_ids table are range queries and with large partitions, the reads are becoming super slow.  

I would like to know if anyone else has observed similar issue, is there a set of configurations that need to be checked or something else you would suggest.

Thanks
Saurabh


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

simone3.cattani@...
 

On Mon, Feb 15, 2021 at 05:59 AM, sauverma wrote:
zeotap
Hi Saurabh,
we are experiencing the exact same issue: spark job with one janusgraph instance per partition calling a scylladb cluster. Our symptom is a ton of timeout exceptions due to missing QUORUM in read queries.

If you want to share your proposed solutions, we will be happy to try them.


Re: [Performance Optimization] Optimization around the `system_properties` table interaction

rngcntr
 

Hi Saurabh!

Thanks for reporting that issue! Looking at the open pull requests, I don't see one that addresses this problem. You're always welcome to share your solutions by discussing them here or even submitting PRs directly. Do you already have these fixes in place and use them in your productive environment or is it rather an early stage draft?


[Performance Optimization] Optimization around the `system_properties` table interaction

sauverma
 
Edited

Hi all

- The interaction with the underlying KV store via janusgraph client hits the `system_properties` table with a range query where the key is `configuration` (key = 0x636f6e66696775726174696f6e)

- The observation is that the janusgraph client stores all the configurations (static + dynamic) is stored against `configuration` key

- When we run the job with spark executors, where each executor is using janusgraph embedded mode, each of these executors create executor level entries (dynamic) with the same key `configuration`

- Thus as the number of executors increase, the particular partition with the key `configuration` starts becoming a large partition, and queries with key=`configuration` become range queries scanning the large partition as seen in below graphs (these are from scylla monitoring grafana dashboard)

- I would like to know if this there is a fix in progress for this issue - we at zeotap are using JanusGraph at tremendous scale (single graphs having 70 billion Vertices and 50 billion edges) and have identified couple of solutions to fix this



Thanks
Saurabh Verma
+91 7976984604


Re: Why does JanusGraph use two versions of netty?

zblumenf@...
 

Ahh.  Actually think I may have answered my own question building from source.  Looks like hadoop:2.7.7 and spark-gremlin:3.4.10 are still using netty 3 and the POM just bumped to the latest netty 3 version for dependency management. Removing netty 3 gets the following enforcer error (similar in a few different modules) 

Dependency convergence error for io.netty:netty:3.9.9.Final paths to dependency are:
+-org.janusgraph:janusgraph-hadoop:0.6.0-SNAPSHOT
  +-org.apache.tinkerpop:spark-gremlin:3.4.10
    +-io.netty:netty:3.9.9.Final
and
+-org.janusgraph:janusgraph-hadoop:0.6.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-client:2.7.7
    +-org.apache.hadoop:hadoop-hdfs:2.7.7
      +-io.netty:netty:3.6.2.Final

I think that makes sense.  


Why does JanusGraph use two versions of netty?

zblumenf@...
 

Noticed that there are two versions of Netty in the POM for JanusGraph: 
 
 
Curious why this is and if it is being used for something specific.  Otherwise can 3 get bumped to 4? 


JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hello,
I'm working on planning another JanusGraph community meetup and wanted to gauge community interest in doing an in-depth focus on tackling OLAP/graph algorithmic work with JanusGraph. This has been covered briefly in previously meetups but I think is worthy of more focus due to the challenges folks face getting JanusGraph/Spark up and running and working performantly. I'm particularly interested in hearing if others have had success with this route in production, and if not, if they've employed other techniques to serve their analytics needs (shortest path, clustering, centrality, data science workflows, etc.). In one case on our side, we had good success deploying a separate, custom C++ in-memory graph alongside JG that serves shortest path requests with a much lower latency than JG and Spark could. Please reach out on this thread or directly to me if you're interested in presenting on this topic or taking part in a panel discussion. I'm currently targeting the March timeframe for the meetup.

Thanks,
Ted


Re: [janusgraph-foundationdb] - New Release

rngcntr
 

Sounds like a good thing to do. We should consider updating our documentation (#54) before that, so that users who get started with the new release don't experience any issues regarding installation and first steps.
But other than that, I strongly support your request to release a new version.

81 - 100 of 1582