call queue is full on /0.0.0.0.:60020, too many items queued? hbase
Here is my problem:
We are using cloudera 5.7.0 with java 1.8.0_74 and we have spark 1.6.0, janusgraph 0.1.1, hbase 1.2.0.
I try to load 200Gb of graph data and for that I run the following code in gremlin shell:
//so far so good everything works perfectly
It starts executing the spark job and Stage-0 runs smoothly however at Stage-1 I get an Exception:
However spark recovers the failed tasks and completes the Stage-1 and then Stage-2 completes flawlessly. Since Spark persists the previous results in memory, Stage-3 and Stage-4 is skipped and Stage-5 is started however Stage-5 gets the same CallQueueTooBigException exceptions, nevertheless spark recovers the problem again.
My problem is this stage (Stage-5) takes too long to execute. Actually it took 14 hours at my last run and I killed the spark job. I think this is really odd for such a little input data(200 GB). Normally my cluster is so fast that I am able to load 3 TB of data into HBase(with bulkloading via mapreduce) in 1 hour. I tried to increase the number of workers
however this time the number of CallQueueTooBigException exceptions were so high that they did not let the spark job recover from the exceptions.
Is there any way that I can decrease the runtime of the job?
Below I am giving extra materials that hopefully may lead you to the source of the problem:
Here is how I start the gremlin shell
export HADOOP_CONF_DIR= /etc/hadoop/conf.cloudera.yarn
exec $GREMLINHOME/bin/gremlin.sh $*
and here is my conf/hadoop-graph/hadoop-call-
Thx in advance,