Re: Managed memory leaks using the BulkLoaderVertexProgram
amark...@...
Hi Jason, Please see my comments below. I had created the schema before hand, its an empty graph when I start the load the data I had increase the ids.blocksize to 100K, default is 10K. Janusgraph docs say that as a rule of thumb, you should increase that to 10X, if you are doing bulk uploads. But I changed that to default 10K script_challenge.groovy def parse(line, factory) { def (cId, cType, cUserId,cCreatedDate) = line.split(/,/).toList() def v1 = factory.vertex(cId, "Challenge") v1.property("challengeId", cId) // first value is always the name v1.property("challengeType", Short.parseShort(cType)) // first value is always the name v1.property("creatorUserId", Long.parseLong(cUserId)) // first value is always the name v1.property("challengeCreatedDate", Date.parse("yyyy-mm-dd",cCreatedDate)) // first value is always the name return v1 } sample data : I have about 200 MM rows d such data. I am loading about 1.4 MM when I am getting the errors challenge-1,2,2,2016-04-04 challenge-2,1,1,2016-04-03 In the error message, . It mostly does part commit, so out of my 1.4MM rows, I can load about ~200K at a time before I get an error Steps taken for isolating the error source 1) Increased max and initial heap size on EC2. java -XX:+PrintFlagsFinal -Xms2g -Xmx21g -version | grep HeapSize 2) Changes gremlin.sh to launch with more memory (Last time in gremlin.sh) exec $JAVA -Xmx16g $JAVA_OPTIONS $MAIN_CLASS "$@" 3) Increased the memory for executors spark.master=local[4] spark.executor.memory=3g My observations
I think the problem is with using bulkLoader(OneTimeBulkLoader) in BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph) If I go with the default value (incremental loader) by using this blvp = BulkLoaderVertexProgram.build().writeGraph("/home/ubuntu/janusgraphdocker/conf/janusgraph-cassandra-es.properties").create(graph) I don't get the spark executor memory error But using this, I get warning messages WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertice And the insert is very slow So, I think the problem is bulkLoader(OneTimeBulkLoader). Any thoughts? Thanks for looking into it. I appreciate it On Sunday, October 8, 2017 at 12:02:46 PM UTC-7, Jason Plurad wrote:
|
|