Loading 10k nodes on Janusgraph/BerkeleyDB
Damien Seguy <damie...@...>
I'm running Janusgraph 0.1.1, on OSX. Berkeley db is the backend. I used xms256m and xmx5g
I'm trying to load a graphson into Janus. There are various graphson of various sizes.
When the graphson is below 10k nodes, it usually goes well. It is much faster with 200 tokens than with 9000 (sounds normal).
When I reach 10k tokens, something gets wrong and berkeley db emits a lot of errors:
176587 [pool-6-thread-1] WARN org.janusgraph.diskstorage.log.kcvs.KCVSLog - Could not read messages for timestamp [2017-06-27T16:28:42.502Z] (this read will be retried)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 7.3.7) db/berkeley java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
The load script is simple :
There are no index (yet).
Sometimes, I managed to query the graph with another connexion (the loading never ends), and g.V().count() tells 10000.
This looks like a transaction/batch size, but I don't know where to go with that information.
I'm sure there is something huge that I'm missing. Any pointer would be helpful.