Re: How to split graph in multiple graphml files and load them separately


Laura Morales <lauretas@...>
 

I've also noticed that graphml files can specify an "id" for the <graph> node, but I guess this has no effect on Janus at all? Like, it's completely ignored? Am I right?

Sent: Monday, July 26, 2021 at 7:50 AM
From: "Laura Morales" <lauretas@...>
To: janusgraph-users@...
Cc: janusgraph-users@...
Subject: Re: [janusgraph-users] How to split graph in multiple graphml files and load them separately

Apperently, you have an external naming convention to recognize shared vertices
The convention is simply to use custom IDs in graphml, like this

<node id="data_source1:id0"/>
<node id="data_source1:id1"/>
...

<node id="data_source2:id0"/>
<node id="data_source2:id1"/>
...

When I "merge" all the nodes/edges of the two graphml files into a single file and load the new file into Janus, Janus will replace all the IDs with its custom Long values. But all the vertexes and edges are imported correctly otherwise. Only the IDs have been changed from String to Long. For my particular use case I don't mind the IDs being changed, but having to "merge" and reinsert the whole graph every time is really inconvenient and doesn't really scale beyond a small graph. I need to "merge" all the files because if I load them separately, Janus will not treat two vertexes with the same ID from two separate files as the same vertex; it will create 2 nodes and give them 2 different IDs.
I feel like this problem probably wouldn't exist if the graphml or graphson loaders would use the user-defined IDs instead of replacing them with Longs.

Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.