Date
1 - 3 of 3
bulk loading error
Elizabeth <hlf...@...>
Hi Marc, This is for your request for posting here:) Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: def loadTestSchema(graph) { g = graph.traversal() t=System. new File("/home/dev/ graph.tx().commit() u = System. print u/1000+" seconds \n" g = graph.traversal() g.V().has('uid', 1) } The schema is as follows: def defineTestSchema(graph) { mgmt = graph.openManagement() g = graph.traversal() // vertex labels userId= mgmt.makeVertexLabel("userId") // edge labels relatedby = mgmt.makeEdgeLabel("relatedby" // vertex and edge properties uid = mgmt.makePropertyKey("uid"). // global indices //mgmt.buildIndex("byuid", Vertex.class).addKey(uid). mgmt.buildIndex("byuid", Vertex.class).addKey(uid). mgmt.commit() //mgmt = graph.openManagement() //mgmt.updateIndex(mgmt. //mgmt.commit() } configuration file is : janusgraph-hbase-es.properties gremlin.graph=org.janusgraph. storage.backend=hbase storage.batch-loading=true schema.default=none storage.hostname=127.0.0.1 cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time = 180000 cache.db-cache-size = 0.5 index.search.elasticsearch. index.search.backend= index.search.hostname=127.0.0. However, the loading time is still very long. 100 0.026s 10k 49.001seconds 100k 35.827 seconds 1million 379.05 seconds. 10 million: error gremlin> loadTestSchema(graph) 15:59:27 WARN org.janusgraph. GC overhead limit exceeded Type ':help' or ':h' for help. Display stack trace? [yN]y java.lang.OutOfMemoryError: GC overhead limit exceeded What i am wondering is 1) that why does bulk-loading seem not working, though I have already set storage.batch-loading= 2) how to solve the GC overhead limit exceeding? 3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load the last step failed: gremlin> graph.compute( No signature of method: org.apache.tinkerpop.gremlin. Possible solutions: program(org.apache.tinkerpop. Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute( Many thanks! Eliz |
|
Ted Wilmes <twi...@...>
Hi Eliz, For your first code snippet, you'll need to add in a periodic commit every X number of vertices instead of after you've loaded the whole file. That X will vary depending on your hardware, etc. but you can experiment and find what gives you the best performance. I'd suggest starting at 100 and going from there. Once you get that working, you could try loading data in parallel by spinning up multiple threads that are addV'ing and periodically committing. For the second approach, using the TinkerPop BulkLoaderVertexProgram, you do not need to download TP separately. I think from looking at your stacktrace, you may just be missing a bit when you constructed the vertex program. Did you call create at the end of its construction like in this little snippet? blvp = BulkLoaderVertexProgram.build(). bulkLoader(OneTimeBulkLoader). writeGraph(writeGraphConf).create(modern) Create takes the input graph that you're reading from as an argument. --Ted On Sunday, June 25, 2017 at 8:48:57 PM UTC-5, Elizabeth wrote:
|
|
HadoopMarc <m.c.d...@...>
And this was the answer that Eliz referred to above: Hi Eliz, Good to hear that you make progress. I do not see this post on
the gremlin users list. Would you be so kind as to post it there?
I'll then add the answers below. As to your questions:
Op maandag 26 juni 2017 15:30:03 UTC+2 schreef Ted Wilmes:
|
|