Re: janusgraph and deeplearning
hadoopmarc@...
Hi Jonathan,
One thing is not yet clear to me: does your graph fit into a single node (regarding memory and GPU) or do you plan to use distributed pytorch? Either way, I guess it would be most efficient to use a two step process:
Cool that you apply janusgraph to this use case, so do not hesitate to ask for more details! Marc |
|
Re: How to split graph in multiple graphml files and load them separately
hadoopmarc@...
Hi Laura,
Without checking this in the code, it only seems logical that the graph id is ignored, because you have to supply the io readers with an existing Graph instance. Apparently it was chosen to make the user responsible for supplying the Graph that corresponds to the graph id in the xml file. Marc |
|
Re: Performance Improvement
Vinayak Bali
Laura that is helpful, will go through it and try to implement it. Also, if there are any configurations that can be tuned for better performance, please share them. On Mon, Jul 26, 2021 at 2:22 PM Laura Morales <lauretas@...> wrote: There's a BUILDING file with instructions in the repo. |
|
[ANNOUNCEMENT] JanusGraph enabled donations on LFX Crowdfunding
The JanusGraph Technical Steering Committee is excited to announce that JanusGraph is now accepting donations. As you may know, most of JanusGraph contributors are not full-time JanusGraph employees, thus we came up with the idea to try to collect donations from the community to be able to hire full time employees to JanusGraph. With your help JanusGraph will be able to produce releases much more often and we will be able to develop JanusGraph much faster. JanusGraph Technical Steering Committee guarantees to be fully transparent with the community about any penny spent. We are accepting contributions via LFX Crowdfunding which has an open ledger where you can check all the transactions made and their descriptions. LFX Crowdfunding link where JanusGraph accepts donations is : https://crowdfunding.lfx.linuxfoundation.org/projects/janusgraph Best regards, Oleksandr Porunov on behalf of JanusGraph TSC |
|
Re: Performance Improvement
Laura Morales <lauretas@...>
There's a BUILDING file with instructions in the repo.
Sent: Monday, July 26, 2021 at 10:31 AM From: "Vinayak Bali" <vinayakbali16@...> To: janusgraph-users@... Subject: Re: [janusgraph-users] Performance Improvement Hi Boxuan, Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. The janusgraph and Cassandra configurations are the same which are set as default while installing. The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. Thanks & Regards, Vinayak |
|
Re: Performance Improvement
Vinayak Bali
Hi Boxuan, Thank you for your response. I am not sure, how I can build janusgraph from the master branch. If you can share step's/procedure to do the same, I can check otherwise need to wait for the new release. My use case consists of a single node label and self-relation between them. You consider it as BOM in the supply chain. The janusgraph and Cassandra configurations are the same which are set as default while installing. The data loading script takes the CSV files as input, divides the files into different batches, and loads the batches using multi-threading. If you need more details, I can share a generic script with you and also the metrics. Thanks & Regards, Vinayak On Mon, Jul 26, 2021 at 1:38 PM Boxuan Li <liboxuan@...> wrote:
|
|
Re: Performance Improvement
Boxuan Li
Hi Vinayak, Would you be able to build JanusGraph from master branch and try again? The upcoming 0.6.0 release contains many optimizations which might be helpful. Without knowing more details of your use case (your queries, your loading script, your JanusGraph configs, your JanusGraph metrics, your Cassandra metrics), it’s very hard to give any concrete suggestion. Anyway, I would strongly recommend you try out the master version first and see how it goes. Best, Boxuan 「Vinayak Bali <vinayakbali16@...>」在 2021年7月26日 週一,下午3:55 寫道:
|
|
Performance Improvement
Vinayak Bali
Hi All, I am using janusgraph for a while. The use case which I am working on consists of 1.5 million nodes and 3 million edges. Prepared a batch loading groovy script. The performance of the data loading script is as follows: Nodes: 5 mins Edges: 13 mins Total: 18 mins Also, the count query including edges takes mins to execute. Both Janusgraph(0.5.2) and Cassandra are installed on the same instance. Hardware Configuration: RAM: 92 GB Cores: 48 I want expert suggestions/steps which can be followed to improve the performance. Request you to share your thoughts regarding the same. Thanks & Regards, Vinayak |
|
Re: How to split graph in multiple graphml files and load them separately
Laura Morales <lauretas@...>
I've also noticed that graphml files can specify an "id" for the <graph> node, but I guess this has no effect on Janus at all? Like, it's completely ignored? Am I right?
toggle quoted message
Show quoted text
Sent: Monday, July 26, 2021 at 7:50 AM |
|
Re: How to split graph in multiple graphml files and load them separately
Laura Morales <lauretas@...>
Apperently, you have an external naming convention to recognize shared verticesThe convention is simply to use custom IDs in graphml, like this <node id="data_source1:id0"/> <node id="data_source1:id1"/> ... <node id="data_source2:id0"/> <node id="data_source2:id1"/> ... When I "merge" all the nodes/edges of the two graphml files into a single file and load the new file into Janus, Janus will replace all the IDs with its custom Long values. But all the vertexes and edges are imported correctly otherwise. Only the IDs have been changed from String to Long. For my particular use case I don't mind the IDs being changed, but having to "merge" and reinsert the whole graph every time is really inconvenient and doesn't really scale beyond a small graph. I need to "merge" all the files because if I load them separately, Janus will not treat two vertexes with the same ID from two separate files as the same vertex; it will create 2 nodes and give them 2 different IDs. I feel like this problem probably wouldn't exist if the graphml or graphson loaders would use the user-defined IDs instead of replacing them with Longs. |
|
Re: janusgraph and deeplearning
jonathan.mercier.fr@...
Hi marc,
Thanks for your reply. I have some knowledge data from multiple source, so firstly (i) I had to loaId those data to janusgraph, (ii) I need to apply a reconciliation algorithm which generate the knowledge graph. So I would like to train on this newly model with a graph neural network with pytorch or if not possible with deeplearning4j (I prefer python) Thanks |
|
Re: How to split graph in multiple graphml files and load them separately
hadoopmarc@...
Hi Laura,
I do not see an easy solution. Although JanusGraph supports custom vertex id's, I do not belief this is compatible with the gremlin io readers (at least, not out of the box, I tried...). An alternative collaboration model would be to setup Gremlin Server. Then you have the gremlin language variants available (e.g. python) to write new and modified data directly to a shared graph (without using graphML files for transport). Apperently, you have an external naming convention to recognize shared vertices, so you could add the external names as properties and define a janusgraph index for that. Best wishes, Marc |
|
Re: janusgraph and deeplearning
hadoopmarc@...
Hi Jonathan,
Can you elaborate on why you make the connection between janusgraph and deep learning? I can only imagine the wish to apply graph data stored in Janusgraph to train a GNN. I do not think however that you can leverage the message passing of TinkerPop VertexPrograms, because it is java based and cannot apply GPU's. Best wishes, Marc |
|
janusgraph and deeplearning
jonathan.mercier.fr@...
Dear,
I am looking to use both janusgraph with a deeplearning frameworks such as pytorch. Does anyone have some experience/example on this subject ? Actually I use parquet -> dataframe -> pytorch Thanks for your help |
|
Fw: How to split graph in multiple graphml files and load them separately
Laura Morales <lauretas@...>
ERRATA
1. if I load one file, then the other, Janus will not create the edges that have "origin" in one file and "target" on another because I guess it does not find the target vertex on the same file.it creates the edge but instead of linking to the vertex with "id" from the other file, it will create a new empty node (with no property) and assign it a new ID. |
|
How to split graph in multiple graphml files and load them separately
Laura Morales <lauretas@...>
Assuming that my colleagues and I are working on different "parts" of the same graph, everyone of us creates one GraphML file and then we'd like to load our files into the graph (we're using .readGraph("file.graphml"). My problems are:
1. if I load one file, then the other, Janus will not create the edges that have "origin" in one file and "target" on another because I guess it does not find the target vertex on the same file. Janus assigns its own IDs so it looks like we have to "merge" all the files into one before inserting data to the graph 2. because of 1. we cannot "update" only the part of the graph where the file has changed, instead I have to recreate the whole graph everytime I'd like to know your comments about how we could organize a collaboration like this, ie. people working on different part of the same graph, merging them together, and update only the parts that have changed. "readGraph" is very useful because a file can be loaded in one line without having to write any custom groovy scripts for parsing all the files. Thank you. |
|
Re: Tinkerpop 3.4.1 with Hadoop3
Hi Anjani,
JanusGraph receives the hadoop dependency from Apache TinkerPop, so Apache TinkerPop will be in the driver seat regarding the upgrade of hadoop. Making a custom build of janusgraph with hadoop3 will be very time consuming because many library conflicts exist, which need to be resolved manually and tested. If you try, be sure to start from a janusgraph branch that uses TinkerPop 3.5.x with spark 3.0 (so janusgraph master or the future 0.6.0) to minimize the chances of library conflicts. After a quick googling session, I find that hadoop3 services (resource manager, name node) support hadoop2 clients to a certain extent. What errors do you get when trying janusgraph-0.5.3 on a hadoop3 cluster? Note that TinkerPop/Janusgraph are not shipped with hadoop-yarn, see: https://tinkerpop.apache.org/docs/current/recipes/#olap-spark-yarn Best wishes, Marc [Edited] Adding missing spark-yarn and hadoop-yarn jars is described here. |
|
Re: Query requires iterating over all vertices
The difference between “text” and “string” is explained here: https://docs.janusgraph.org/index-backend/text-search/
In short, “text” is for full-text search (tokenized search) capability while “string” is for whole string match. You can also take a look at https://codecurated.com/blog/elasticsearch-text-vs-keyword which explains Elasticsearch’s Text (corresponding to “text” in JanusGraph), and Keyword (corresponding to “string” in JanusGraph). The same applies to Lucene.
Speaking of your use case, has(“country”) is interpreted as a “TraversalFilter” step in Gremlin, different from has(“country”, “value”) which is interpreted as a “GraphStep”. JanusGraph only applied indexes when it saw a “GraphStep”. Starting from 0.6.0 (unreleased yet), indexes are also applied for “has(key)” steps. This is just FYI.
Best,
Boxuan
|
|
New TSC member: Boxuan Li
On behalf of the JanusGraph Technical Steering Committee (TSC), I'm pleased to welcome a new Technical Steering Committee member on the project!
Boxuan Li has provided major contributions and has demonstrated an on-going commitment to the project. Being a TSC member enables assistance with the project management and to guide the direction of the project. Congratulations, Boxuan Li! |
|
Re: Query requires iterating over all vertices
Laura Morales <lauretas@...>
Looks like this is because you are indexing your property as text. See https://docs.janusgraph.org/index-backend/text-search/#full-text-search_1 Thank you so much! I was using the newest release, but compiling the lastest source (from main branch) did indeed fix the problem. Or at least I think it has, since I don't get any more warnings. On the other hand I'm absolutely confused and I don't know what I'm doing. I'm new with Janus, and the process of creating these indexes seems overly complex. What's the difference between "text" and "string"? And why wasn't it working with the previous release? |
|