bulk loading error


Elizabeth <hlf...@...>
 

Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.