Hi Jason,
Please see my comments below.
I had created the schema before hand, its an empty graph when I start the load the data
I had increase the ids.blocksize to 100K, default is 10K. Janusgraph docs say that as a rule of thumb, you should increase that to 10X, if you are doing bulk uploads. But I changed that to default 10K
script_challenge.groovy
def parse(line, factory) {
def (cId, cType, cUserId,cCreatedDate) = line.split(/,/).toList()
def v1 = factory.vertex(cId, "Challenge")
v1.property("challengeId", cId) // first value is always the name
v1.property("challengeType", Short.parseShort(cType)) // first value is always the name
v1.property("creatorUserId", Long.parseLong(cUserId)) // first value is always the name
v1.property("challengeCreatedDate", Date.parse("yyyy-mm-dd",cCreatedDate)) // first value is always the name
return v1
}
sample data : I have about 200 MM rows d such data. I am loading about 1.4 MM when I am getting the errors
challenge-1,2,2,2016-04-04
challenge-2,1,1,2016-04-03
In the error message, .FastNoSuchElementException happens after the I get the executor memory error. I believe that the executor memory error is triggering it
It mostly does part commit, so out of my 1.4MM rows, I can load about ~200K at a time before I get an error
Steps taken for isolating the error source
1) Increased max and initial heap size on EC2. java -XX:+PrintFlagsFinal -Xms2g -Xmx21g -version | grep HeapSize
2) Changes gremlin.sh to launch with more memory (Last time in gremlin.sh) exec $JAVA -Xmx16g $JAVA_OPTIONS $MAIN_CLASS "$@"
3) Increased the memory for executors
spark.master=local[4]
spark.executor.memory=3g
My observations
- graph.addVertex method is not supported for batch upload. Tinkerpop 3.1.8 and 3.2.6 supports it. I have the use the older factory.vertex method
- After trying all these combinations and some others, I was still getting the same executor running out of memory error
I think the problem is with using bulkLoader(OneTimeBulkLoader) in
BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph)
If I go with the default value (incremental loader) by using this
blvp = BulkLoaderVertexProgram.build().writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph)
I don't get the spark executor memory error
But using this, I get warning messages
WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertice
And the insert is very slow
So, I think the problem is bulkLoader(OneTimeBulkLoader). Any thoughts?
Thanks for looking into it. I appreciate it
On Sunday, October 8, 2017 at 12:02:46 PM UTC-7, Jason Plurad wrote:
org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException usually indicates you're doing an unchecked call to next() in your traversal.Are you able to share script_challenge.groovy? Also it seems like you should sort out what's going on with
ids.block-size. I'd recommend creating the empty graph before running the bulk loader.
Have you tried increasing the executor memory?
On Sunday, October 8, 2017 at 2:23:55 PM UTC-4, amarkanday wrote:
I keep running into error
Script
start = new Date();
hdfs.copyFromLocal("/home/ubuntu/example/data/ctest.csv","data/ctest.csv")
hdfs.copyFromLocal("/home/ubuntu/example/scripts/script_challenge.groovy","scripts/script_challenge.groovy")
graph = GraphFactory.open("/home/ubuntu/janusgraphdocker/conf/hadoop-script.properties")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
stop = new Date();
Input File size is about 1.7 MM nodes with 3 properties.
hadoop-script properties
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=data/ctest.csv
gremlin.hadoop.scriptInputFormat.script=scripts/script_challenge.groovy
gremlin.hadoop.outputLocation=output
####################################
# SparkGraphComputer Configuration #
####################################
spark.master=local[4]
spark.executor.memory=4g
spark.serializer=org.apache.spark.serializer.KryoSerializer
Error
graph.compute(SparkGraphComputer).program(blvp).submit().get()
09:20:48 WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting ids.block-size=1000000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000). Use the ManagementSystem interface instead of the local configuration to control this setting.
09:21:54 ERROR org.apache.spark.executor.Executor - Managed memory leak detected; size = 78887762 bytes, TID = 2
09:21:55 WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting ids.block-size=1000000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000). Use the ManagementSystem interface instead of the local configuration to control this setting.
09:22:25 ERROR org.apache.spark.executor.Executor - Managed memory leak detected; size = 349007340 bytes, TID = 3
09:22:25 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 5.0 (TID 3)
org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException
09:22:25 WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 3, localhost): org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException
09:22:25 ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 5.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 3, localhost): org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException
Not sure why I am getting this. Is this a spark configuration setting ?
Also, I am only use factory.vertex and not graph.addVertex method, to add vertices.
Only happens when I try to load a large number of vertices at a time
I am running this on EC2 C4.4XL Ubuntu Xenial, 8 Core, and 30GB RAM