Date   

Re: Property keys unique per label

hadoopmarc@...
 

Hi Laura,

Indeed, unique property key names are a limitation. But to be honest: if two properties have a different data-value type I would say these are different properties, so why give them the same name?

Best wishes,    Marc


Re: How to create users and roles

hadoopmarc@...
 

Hi Jonathan,

User authorization for Gremlin Server was introduced in TinkerPop 3.5.0, see https://tinkerpop.apache.org/docs/current/reference/#authorization

JanusGraph will use TinkerPop 3.5.x in its upcoming 0.6.0 release. If you want, you can already build the 0.6.0-SNAPSHOT distribution archives from master, using:

mvn clean install -Pjanusgraph-release -Dgpg.skip=true -DskipTests=true
Best wishes,     Marc


How to create users and roles

jonathan.mercier.fr@...
 

Dear,

I have not found into the documentation on the process to create and manage user and roles in order to contro datal access.
At this page https://docs.janusgraph.org/basics/server/ we can see they are a connection andauthentification through HTTPor websocket.
But I do not see where it is describe how to How to manage users and roles .

Thanks


Property keys unique per label

Laura Morales <lauretas@...>
 

The documentation says "Property key names must be unique in the graph". Does it mean that it's not possible to have property keys that are unique *per label*? In other words, can I have two distinct properties with the same name but different data-value types, as long as they are applied to vertexes with different labels?


Re: janusgraph and deeplearning

hadoopmarc@...
 

Hi Jonathan,

One thing is not yet clear to me: does your graph fit into a single node (regarding memory and GPU) or do you plan to use distributed pytorch? Either way, I guess it would be most efficient to use a two step process:

  1. get all data from janusgraph and store it on disk in a suitable format
  2. run pytorch geometric (may be in a distributed way) from the files on disk
JanusGraph only supports the hadoop InputFormats to retrieve graph data in a distributed way. Some teams succeeded in retrieving data from partitions from the janusgraph storage backends (not using any janusgraph API, see here), which could be done in a custom pytorch loader, but this is not documented (yet).

Cool that you apply janusgraph to this use case, so do not hesitate to ask for more details!

Marc


Re: How to split graph in multiple graphml files and load them separately

hadoopmarc@...
 

Hi Laura,

Without checking this in the code, it only seems logical that the graph id is ignored, because you have to supply the io readers with an existing Graph instance. Apparently it was chosen to make the user responsible for supplying the Graph that corresponds to the graph id in the xml file.

Marc


Re: Performance Improvement

Vinayak Bali
 

Laura that is helpful, will go through it and try to implement it. 

Also, if there are any configurations that can be tuned for better performance, please share them.

On Mon, Jul 26, 2021 at 2:22 PM Laura Morales <lauretas@...> wrote:
There's a BUILDING file with instructions in the repo.
 
 
 

Sent: Monday, July 26, 2021 at 10:31 AM
From: "Vinayak Bali" <vinayakbali16@...>
To: janusgraph-users@...
Subject: Re: [janusgraph-users] Performance Improvement

Hi Boxuan, 
 
Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. 
 
My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. 
The janusgraph and Cassandra configurations are the same which are set as default while installing.
 
The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. 
 
Thanks & Regards,
Vinayak 






[ANNOUNCEMENT] JanusGraph enabled donations on LFX Crowdfunding

Oleksandr Porunov
 

The JanusGraph Technical Steering Committee is excited to announce that JanusGraph is now accepting donations.


As you may know, most of JanusGraph contributors are not full-time JanusGraph employees, thus we came up with the idea to try to collect donations from the community to be able to hire full time employees to JanusGraph.


With your help JanusGraph will be able to produce releases much more often and we will be able to develop JanusGraph much faster.


JanusGraph Technical Steering Committee guarantees to be fully transparent with the community about any penny spent. We are accepting contributions via LFX Crowdfunding which has an open ledger where you can check all the transactions made and their descriptions.


LFX Crowdfunding link where JanusGraph accepts donations is : https://crowdfunding.lfx.linuxfoundation.org/projects/janusgraph


Best regards,

Oleksandr Porunov

on behalf of JanusGraph TSC


Re: Performance Improvement

Laura Morales <lauretas@...>
 

There's a BUILDING file with instructions in the repo.
 
 
 

Sent: Monday, July 26, 2021 at 10:31 AM
From: "Vinayak Bali" <vinayakbali16@...>
To: janusgraph-users@...
Subject: Re: [janusgraph-users] Performance Improvement

Hi Boxuan, 
 
Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. 
 
My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. 
The janusgraph and Cassandra configurations are the same which are set as default while installing.
 
The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. 
 
Thanks & Regards,
Vinayak


Re: Performance Improvement

Vinayak Bali
 

Hi Boxuan, 

Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. 

My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. 
The janusgraph and Cassandra configurations are the same which are set as default while installing.

The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. 

Thanks & Regards,
Vinayak

On Mon, Jul 26, 2021 at 1:38 PM Boxuan Li <liboxuan@...> wrote:
Hi Vinayak,

Would you be able to build JanusGraph from master branch and try again? The upcoming 0.6.0 release contains many optimizations which might be helpful. 

Without knowing more details of your use case (your queries, your loading script, your JanusGraph configs, your JanusGraph metrics, your Cassandra metrics), it’s very hard to give any concrete suggestion. Anyway, I would strongly recommend you try out the master version first and see how it goes.

Best,
Boxuan

「Vinayak Bali <vinayakbali16@...>」在 2021年7月26日 週一,下午3:55 寫道:
Hi All, 

I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: 

Nodes: 5 mins
Edges: 13 mins
Total: 18 mins

Also, the count query including edges takes mins to execute. 
Both Janusgraph(0.5.2) and Cassandra are installed on the same instance.
 
Hardware Configuration:
RAM: 92 GB
Cores: 48 

I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same.

Thanks & Regards,
Vinayak


Re: Performance Improvement

Boxuan Li
 

Hi Vinayak,

Would you be able to build JanusGraph from master branch and try again? The upcoming 0.6.0 release contains many optimizations which might be helpful. 

Without knowing more details of your use case (your queries, your loading script, your JanusGraph configs, your JanusGraph metrics, your Cassandra metrics), it’s very hard to give any concrete suggestion. Anyway, I would strongly recommend you try out the master version first and see how it goes.

Best,
Boxuan

「Vinayak Bali <vinayakbali16@...>」在 2021年7月26日 週一,下午3:55 寫道:

Hi All, 

I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: 

Nodes: 5 mins
Edges: 13 mins
Total: 18 mins

Also, the count query including edges takes mins to execute. 
Both Janusgraph(0.5.2) and Cassandra are installed on the same instance.
 
Hardware Configuration:
RAM: 92 GB
Cores: 48 

I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same.

Thanks & Regards,
Vinayak


Performance Improvement

Vinayak Bali
 

Hi All, 

I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: 

Nodes: 5 mins
Edges: 13 mins
Total: 18 mins

Also, the count query including edges takes mins to execute. 
Both Janusgraph(0.5.2) and Cassandra are installed on the same instance.
 
Hardware Configuration:
RAM: 92 GB
Cores: 48 

I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same.

Thanks & Regards,
Vinayak


Re: How to split graph in multiple graphml files and load them separately

Laura Morales <lauretas@...>
 

I've also noticed that graphml files can specify an "id" for the <graph> node, but I guess this has no effect on Janus at all? Like, it's completely ignored? Am I right?

Sent: Monday, July 26, 2021 at 7:50 AM
From: "Laura Morales" <lauretas@...>
To: janusgraph-users@...
Cc: janusgraph-users@...
Subject: Re: [janusgraph-users] How to split graph in multiple graphml files and load them separately

Apperently, you have an external naming convention to recognize shared vertices
The convention is simply to use custom IDs in graphml, like this

<node id="data_source1:id0"/>
<node id="data_source1:id1"/>
...

<node id="data_source2:id0"/>
<node id="data_source2:id1"/>
...

When I "merge" all the nodes/edges of the two graphml files into a single file and load the new file into Janus, Janus will replace all the IDs with its custom Long values. But all the vertexes and edges are imported correctly otherwise. Only the IDs have been changed from String to Long. For my particular use case I don't mind the IDs being changed, but having to "merge" and reinsert the whole graph every time is really inconvenient and doesn't really scale beyond a small graph. I need to "merge" all the files because if I load them separately, Janus will not treat two vertexes with the same ID from two separate files as the same vertex; it will create 2 nodes and give them 2 different IDs.
I feel like this problem probably wouldn't exist if the graphml or graphson loaders would use the user-defined IDs instead of replacing them with Longs.


Re: How to split graph in multiple graphml files and load them separately

Laura Morales <lauretas@...>
 

Apperently, you have an external naming convention to recognize shared vertices
The convention is simply to use custom IDs in graphml, like this

<node id="data_source1:id0"/>
<node id="data_source1:id1"/>
...

<node id="data_source2:id0"/>
<node id="data_source2:id1"/>
...

When I "merge" all the nodes/edges of the two graphml files into a single file and load the new file into Janus, Janus will replace all the IDs with its custom Long values. But all the vertexes and edges are imported correctly otherwise. Only the IDs have been changed from String to Long. For my particular use case I don't mind the IDs being changed, but having to "merge" and reinsert the whole graph every time is really inconvenient and doesn't really scale beyond a small graph. I need to "merge" all the files because if I load them separately, Janus will not treat two vertexes with the same ID from two separate files as the same vertex; it will create 2 nodes and give them 2 different IDs.
I feel like this problem probably wouldn't exist if the graphml or graphson loaders would use the user-defined IDs instead of replacing them with Longs.


Re: janusgraph and deeplearning

jonathan.mercier.fr@...
 

Hi marc,
Thanks for your reply.
I have some knowledge data from multiple source, so firstly (i) I had to loaId those data to janusgraph, (ii) I need to apply a reconciliation algorithm which generate the knowledge graph. So I would like to train on this newly model with a graph neural network with pytorch or if not possible with deeplearning4j (I prefer python)

Thanks


Re: How to split graph in multiple graphml files and load them separately

hadoopmarc@...
 

Hi Laura,

I do not see an easy solution. Although JanusGraph supports custom vertex id's, I do not belief this is compatible with the gremlin io readers (at least, not out of the box, I tried...).

An alternative collaboration model would be to setup Gremlin Server. Then you have the gremlin language variants available (e.g. python) to write new and modified data directly to a shared graph (without using graphML files for transport). Apperently, you have an external naming convention to recognize shared vertices, so you could add the external names as properties and define a janusgraph index for that.

Best wishes,     Marc


Re: janusgraph and deeplearning

hadoopmarc@...
 

Hi Jonathan,

Can you elaborate on why you make the connection between janusgraph and deep learning? I can only imagine the wish to apply graph data stored in Janusgraph to train a GNN. I do not think however that you can leverage the message passing of TinkerPop VertexPrograms, because it is java based and cannot apply GPU's.

Best wishes,    Marc


janusgraph and deeplearning

jonathan.mercier.fr@...
 

Dear,
I am looking to use both janusgraph with a deeplearning frameworks such as pytorch.
Does  anyone have some experience/example on this subject ?
Actually I use parquet -> dataframe -> pytorch

Thanks for your help


Fw: How to split graph in multiple graphml files and load them separately

Laura Morales <lauretas@...>
 

ERRATA

1. if I load one file, then the other, Janus will not create the edges that have "origin" in one file and "target" on another because I guess it does not find the target vertex on the same file.
it creates the edge but instead of linking to the vertex with "id" from the other file, it will create a new empty node (with no property) and assign it a new ID.


How to split graph in multiple graphml files and load them separately

Laura Morales <lauretas@...>
 

Assuming that my colleagues and I are working on different "parts" of the same graph, everyone of us creates one GraphML file and then we'd like to load our files into the graph (we're using .readGraph("file.graphml"). My problems are:

1. if I load one file, then the other, Janus will not create the edges that have "origin" in one file and "target" on another because I guess it does not find the target vertex on the same file. Janus assigns its own IDs so it looks like we have to "merge" all the files into one before inserting data to the graph
2. because of 1. we cannot "update" only the part of the graph where the file has changed, instead I have to recreate the whole graph everytime

I'd like to know your comments about how we could organize a collaboration like this, ie. people working on different part of the same graph, merging them together, and update only the parts that have changed. "readGraph" is very useful because a file can be loaded in one line without having to write any custom groovy scripts for parsing all the files.
Thank you.

601 - 620 of 6661