Re: Managed memory leaks using the BulkLoaderVertexProgram


amark...@...
 

Hey Ted,

Tried setting gremlin.spark.persistStorageLevel=DISK_ONLY, but still getting 

7:53:16 ERROR org.apache.spark.executor.Executor  - Managed memory leak detected; size = 36796560 bytes, TID = 2

17:53:47 ERROR org.apache.spark.executor.Executor  - Managed memory leak detected; size = 167934858 bytes, TID = 3

17:53:47 ERROR org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 5.0 (TID 3)


Thanks 
Ashish
On Monday, October 9, 2017 at 9:09:01 AM UTC-7, Ted Wilmes wrote:
You should be able to use Spark in standalone mode, just as it comes with JanusGraph, to load that amount of data. Here is one more thing you can try, I usually run bulk loads with this set in my script input properties file:

gremlin.spark.persistStorageLevel=DISK_ONLY

This tells Spark to persist intermediate RDD results to disk temporarily instead of memory which in my experience works well for bulk loading and should decrease the pressure on your executor's memory.

--Ted

On Monday, October 9, 2017 at 4:35:30 AM UTC-5, am...@... wrote:
I guess my bigger question is

What do I need to setup in the env for getting the bulk loader to work

I have installed a local instance of hadoop on my EC2 instance, and added this 

export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
# Put the Hadoop configuration on the classpath so HDFS doesn't resolve to the local filesystem
export CLASSPATH=$HADOOP_CONF_DIR
export HADOOP_GREMLIN_LIBS=/home/ubuntu/janusgraph-0.1.1-hadoop2/ext/spark-gremlin/lib:/home/ubuntu/janusgraph-0.1.1-hadoop2/ext/spark-gremlin/lib2


This command from Tinkerpop docs makes me think that I also need to install Spark 

bin/init-tp-spark.sh /usr/local/spark s...@....0.1 s...@....0.2 s...@....0.3


I guess, what is the config settings for this. I am using SparkGraphComputer only for bulk loading of data 

Cheers
A

On Sunday, 8 October 2017 20:30:02 UTC-7, am...@... wrote:
Hi Jason,

Please see my comments below. 

I had created the schema before hand, its an empty graph when I start the load the data 

I had increase the ids.blocksize to 100K, default is 10K. Janusgraph docs say that as a rule of thumb, you should increase that to 10X, if you are doing bulk uploads. But I changed that to default 10K

script_challenge.groovy

def parse(line, factory) {


   def (cId, cType, cUserId,cCreatedDate) = line.split(/,/).toList()

   def v1 = factory.vertex(cId, "Challenge")

   v1.property("challengeId", cId) // first value is always the name

   v1.property("challengeType", Short.parseShort(cType)) // first value is always the name

   v1.property("creatorUserId", Long.parseLong(cUserId)) // first value is always the name

   v1.property("challengeCreatedDate", Date.parse("yyyy-mm-dd",cCreatedDate)) // first value is always the name

   return v1


}


sample data : I have about 200 MM rows d such data. I am loading about 1.4 MM when I am getting the errors

challenge-1,2,2,2016-04-04

challenge-2,1,1,2016-04-03



In the error message, .FastNoSuchElementException happens after the I get the executor memory error. I believe that the executor memory error is triggering it 

It mostly does part commit, so out of my 1.4MM rows, I can load about ~200K at a time before I get an error 

Steps taken for isolating the error source 

1) Increased max and initial heap size on EC2. java -XX:+PrintFlagsFinal -Xms2g -Xmx21g -version | grep HeapSize
2) Changes gremlin.sh to launch with more memory (Last time in gremlin.sh) exec $JAVA -Xmx16g $JAVA_OPTIONS $MAIN_CLASS "$@"
3) Increased the memory for executors 

spark.master=local[4]

spark.executor.memory=3g


My observations 

  • graph.addVertex method is not supported for batch upload. Tinkerpop 3.1.8 and 3.2.6 supports it. I have the use the older factory.vertex method 
  • After trying all these combinations and some others, I was still getting the same executor running out of memory error 
I think the problem is with using bulkLoader(OneTimeBulkLoader) in 
BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph)

If I go with the default value (incremental loader) by using this 
blvp = BulkLoaderVertexProgram.build().writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph)

I don't get the spark executor memory error 

But using this, I get warning messages 

WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertice


And the insert is very slow 

So, I think the problem is bulkLoader(OneTimeBulkLoader). Any thoughts? 

Thanks for looking into it. I appreciate it 


On Sunday, October 8, 2017 at 12:02:46 PM UTC-7, Jason Plurad wrote:
org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException usually indicates you're doing an unchecked call to next() in your traversal.

Are you able to share script_challenge.groovy? Also it seems like you should sort out what's going on with ids.block-size. I'd recommend creating the empty graph before running the bulk loader.

Have you tried increasing the executor memory?


On Sunday, October 8, 2017 at 2:23:55 PM UTC-4, amarkanday wrote:
I keep running into error

Script 
start = new Date();

hdfs
.copyFromLocal("/home/ubuntu/example/data/ctest.csv","data/ctest.csv")
hdfs
.copyFromLocal("/home/ubuntu/example/scripts/script_challenge.groovy","scripts/script_challenge.groovy")

graph
= GraphFactory.open("/home/ubuntu/janusgraphdocker/conf/hadoop-script.properties")
blvp
= BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph)
graph
.compute(SparkGraphComputer).program(blvp).submit().get()
stop
= new Date();


Input File size is about 1.7 MM nodes with 3 properties.

hadoop-script properties 

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat

gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat

gremlin.hadoop.jarsInDistributedCache=true


gremlin.hadoop.inputLocation=data/ctest.csv

gremlin.hadoop.scriptInputFormat.script=scripts/script_challenge.groovy

gremlin.hadoop.outputLocation=output

####################################

# SparkGraphComputer Configuration #

####################################

spark.master=local[4]

spark.executor.memory=4g

spark.serializer=org.apache.spark.serializer.KryoSerializer


Error

graph.compute(SparkGraphComputer).program(blvp).submit().get()

09:20:48 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting ids.block-size=1000000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000).  Use the ManagementSystem interface instead of the local configuration to control this setting.

09:21:54 ERROR org.apache.spark.executor.Executor  - Managed memory leak detected; size = 78887762 bytes, TID = 2

09:21:55 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting ids.block-size=1000000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000).  Use the ManagementSystem interface instead of the local configuration to control this setting.

09:22:25 ERROR org.apache.spark.executor.Executor  - Managed memory leak detected; size = 349007340 bytes, TID = 3

09:22:25 ERROR org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 5.0 (TID 3)

org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException

09:22:25 WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in stage 5.0 (TID 3, localhost): org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException


09:22:25 ERROR org.apache.spark.scheduler.TaskSetManager  - Task 0 in stage 5.0 failed 1 times; aborting job

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 3, localhost): org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException



Not sure why I am getting this. Is this a spark configuration setting ?

Also, I am only use factory.vertex and not graph.addVertex method, to add vertices.

Only happens when I try to load a large number of vertices at a time 

I am running this on EC2 C4.4XL Ubuntu Xenial, 8 Core, and 30GB RAM






Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.