Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits


HadoopMarc <bi...@...>
 

Hi Roy,

There seem to be three things bothering you here:
  1. you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
  2. you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
  3. the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region.
A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.

Best wishes,

Marc

Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:

Error message:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. 

 graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3

Region info:
hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67     134    /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G  5.1 G  /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$

Anybody who can help me?

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.