How to upload rdf bulk data to janus graph
Arpan Jain <arpan...@...>
I have data in RDF(ttl) format. It is having around 6 million triplets. Currently, I have used rdf2gremlin python script for this conversion but it's taking to much time i.e. for 10k records it took around 1 hour. I am using Scylla DB as a Janus graph backend. Below is the python code I am using.
from rdf2g import setup_graph DEFAULT_LOCAL_CONNECTION_STRING = "ws://localhost:8182/gremlin"
g = setup_graph(DEFAULT_LOCAL_CONNECTION_STRING) import rdflib import pathlib OUTPUT_FILE_LAM_PROPERTIES = pathlib.Path("path/to/ttl/file/.ttl").resolve() rdf_graph = rdflib.Graph() rdf_graph.parse(str(OUTPUT_FILE_LAM_PROPERTIES), format="ttl") Same RDF data in neo4j is taking around only 10 mins to load the whole data. But I want to use the Janus graph. Kindly suggest to me the best way to upload bulk RDF data to Janus graph using python or java. |
|
"alex...@gmail.com" <alexand...@...>
Hi, Try to enable batch loading: "storage.batch-loading=true". Increase your batch mutations buffer: "storage.buffer-size=20480". Increase ids block size: "ids.block-size=10000000". Not sure if your flows just adds or upserts data. In case it upserts you may also set "query.batch=true". That said, I didn't use rdf2gremlin and can't suggest much. Above configurations are just options which I can immediately think of. Of course a proper investigation should be done to suggest performance improvement. You may additionally optimize your ScyllaDB for your use cases. Best regards, Oleksandr On Thursday, December 24, 2020 at 12:24:10 PM UTC+2 ar...@... wrote: I have data in RDF(ttl) format. It is having around 6 million triplets. Currently, I have used rdf2gremlin python script for this conversion but it's taking to much time i.e. for 10k records it took around 1 hour. I am using Scylla DB as a Janus graph backend. Below is the python code I am using. |
|
Arpan Jain <arpan...@...>
All these properties I need to set in the Janusgraph properties file right? I mean the config on which the server is starting. I mean the file where we set the backend storage and host etc.
|
|
"alex...@gmail.com" <alexand...@...>
That's right
toggle quoted message
Show quoted text
On Thursday, December 24, 2020 at 1:43:42 PM UTC+2 ar...@... wrote:
|
|
Arpan Jain <arpan...@...>
Actually I have around 70 fields. So my doubt is - whether is it possible to insert so data without bulk upload so that Janus graph will create it's own schema and letter for remaining data I will use bulk upload true. Will this process give error? That's right |
|