Issues about loading data into JanusGraph
Junfei Hu <hujunf...@...>
I have billions of vertices and edges to load into JanusGraph. Writing into it one by one is too slow for me. I've been looking for methods to do batch loading. Here is some methods and tools I know now.
1. Simply write vertices and edges into JanusGraph. Too slow.
2. Enable batch-load option, then write data into JanusGraph.
When loading too many vertices (says billions), VertexCache will occupy too much JVM Heap Memory, and cause OOM error.
VertexCache clean policy is to wake up clean thread when query-miss count is greater than 5. Loading data won't trigger query-miss.
I don't know how to use this do batch loading. The examples in TinkerPop docs seems to be of no help.
It can actually do batch loading. I've read the source code of its implementation. But seems it's the same as committing when have called addVertex() batch-size times, like 1 or 2 above. It should have the same problems on VertexCache.
It requires user-supplied ids. Will this cause conflicts with JanusGraph-generated ids?
Any misunderstanding above?
Any other tools or methods to do batch-loading?