Error message: ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
toggle quoted message
Show quoted text
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
toggle quoted message
Show quoted text
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
toggle quoted message
Show quoted text
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Hi Roy,
As I mentioned, I did not keep up with possibly new janusgraph-hbase features. From the HBase source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region" config parameter.
https://github.com/apache/hbase/blob/rel/2.1.9/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
It should not be too difficult to adapt the janusgraph HBaseInputFormat to leverage this feature (or maybe it even works without change???).
Best wishes,
Marc
Op dinsdag 8 december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
toggle quoted message
Show quoted text
you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. --------------------- Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory to 40G, it failed either. I felt desperate and come here to ask for help
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Hi Marc,
Thank you for your advice, I will try it and tell you result. Your advice is the trace of light in the dark, I so desire.
toggle quoted message
Show quoted text
On Tuesday, December 8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly new janusgraph-hbase features. From the HBase source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region" config parameter.
It should not be too difficult to adapt the janusgraph HBaseInputFormat to leverage this feature (or maybe it even works without change???).
Best wishes,
Marc
Op dinsdag 8 december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. --------------------- Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory to 40G, it failed either. I felt desperate and come here to ask for help
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be effective. I set it to 40, and there are 40 tasks processing every region. But here comes the new promblem--the data skew. I use g.E().count() to count all the edges of the graph. During counting one region, one spark task containing all 2.6GB data, while other 39 tasks containing 0 data. The task failed again. I checked my data. There are some vertices which have more 1 million incident edges. So I tried to solve this promblem using vertex cut(https://docs.janusgraph.org/advanced-topics/partitioning/), my graph schema is something like [mgmt.makeVertexLabel('product').partition().make() ]. But when I using MR to load data to the new graph, it consumed more than 10 times when the attemp without using partition(), from the hbase table detail page, I found the data loading process was busy reading data from and writing data to the first region. The first region became the hot spot. I guess it relates to vertex ids. Could help me again?
toggle quoted message
Show quoted text
On Tuesday, December 8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly new janusgraph-hbase features. From the HBase source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region" config parameter.
It should not be too difficult to adapt the janusgraph HBaseInputFormat to leverage this feature (or maybe it even works without change???).
Best wishes,
Marc
Op dinsdag 8 december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. --------------------- Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory to 40G, it failed either. I felt desperate and come here to ask for help
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Hi Roy,
I think I would first check whether the skew is absent if you count the rows reading the HBase table directly from spark (so, without using janusgraph), e.g.:
https://stackoverflow.com/questions/42019905/how-to-use-newapihadooprdd-spark-in-java-to-read-hbase-data
If this works all right, than you know that somehow in janusgraph HBaseInputFormat the mappers do not get the right key ranges to read from.
I also though about the storage.hbase.region-count property of janusgraph-hbase. If you would specify this at 40 while creating the graph, janusgraph-hbase would create many small regions that will be compacted by HBase later on. But maybe this creates a different structure in the row keys that can be leveraged by the hbase.mapreduce.tableinput.mappers.per.region.
Best wishes, Marc
Op woensdag 9 december 2020 om 17:16:35 UTC+1 schreef Roy Yu:
toggle quoted message
Show quoted text
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be effective. I set it to 40, and there are 40 tasks processing every region. But here comes the new promblem--the data skew. I use g.E().count() to count all the edges of the graph. During counting one region, one spark task containing all 2.6GB data, while other 39 tasks containing 0 data. The task failed again. I checked my data. There are some vertices which have more 1 million incident edges. So I tried to solve this promblem using vertex cut( https://docs.janusgraph.org/advanced-topics/partitioning/), my graph schema is something like [ mgmt.makeVertexLabel('product').partition().make() ]. But when I using MR to load data to the new graph, it consumed more than 10 times when the attemp without using partition(), from the hbase table detail page, I found the data loading process was busy reading data from and writing data to the first region. The first region became the hot spot. I guess it relates to vertex ids. Could help me again? On Tuesday, December 8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly new janusgraph-hbase features. From the HBase source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region" config parameter.
It should not be too difficult to adapt the janusgraph HBaseInputFormat to leverage this feature (or maybe it even works without change???).
Best wishes,
Marc
Op dinsdag 8 december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. --------------------- Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory to 40G, it failed either. I felt desperate and come here to ask for help
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Thanks Marc
toggle quoted message
Show quoted text
On Friday, December 11, 2020 at 3:40:25 PM UTC+8 HadoopMarc wrote:
Hi Roy,
I think I would first check whether the skew is absent if you count the rows reading the HBase table directly from spark (so, without using janusgraph), e.g.:
If this works all right, than you know that somehow in janusgraph HBaseInputFormat the mappers do not get the right key ranges to read from.
I also though about the storage.hbase.region-count property of janusgraph-hbase. If you would specify this at 40 while creating the graph, janusgraph-hbase would create many small regions that will be compacted by HBase later on. But maybe this creates a different structure in the row keys that can be leveraged by the hbase.mapreduce.tableinput.mappers.per.region.
Best wishes, Marc
Op woensdag 9 december 2020 om 17:16:35 UTC+1 schreef Roy Yu:
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be effective. I set it to 40, and there are 40 tasks processing every region. But here comes the new promblem--the data skew. I use g.E().count() to count all the edges of the graph. During counting one region, one spark task containing all 2.6GB data, while other 39 tasks containing 0 data. The task failed again. I checked my data. There are some vertices which have more 1 million incident edges. So I tried to solve this promblem using vertex cut( https://docs.janusgraph.org/advanced-topics/partitioning/), my graph schema is something like [ mgmt.makeVertexLabel('product').partition().make() ]. But when I using MR to load data to the new graph, it consumed more than 10 times when the attemp without using partition(), from the hbase table detail page, I found the data loading process was busy reading data from and writing data to the first region. The first region became the hot spot. I guess it relates to vertex ids. Could help me again? On Tuesday, December 8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly new janusgraph-hbase features. From the HBase source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region" config parameter.
It should not be too difficult to adapt the janusgraph HBaseInputFormat to leverage this feature (or maybe it even works without change???).
Best wishes,
Marc
Op dinsdag 8 december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. --------------------- Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory to 40G, it failed either. I felt desperate and come here to ask for help
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Evgeniy Ignatiev <yevgeniy...@...>
Oh, I recall that we once tried to debug the same issue with
JanusGraph-Hbase, had clear supernodes in the graph. No attempts
on repartitioning, including analyzing code of SparkGraphComputer
and tinkering around thought to make it work for partitioned
vertices etc. were successful - apparently using Cassandra (latest
3.x version at the time) didn't lead to OOM, but was noticeably
slower than HBase when we used it with smaller graphs.
Best regards,
Evgenii Ignatev.
On 15.12.2020 07:07, Roy Yu wrote:
toggle quoted message
Show quoted text
Thanks Marc
On Friday, December 11, 2020
at 3:40:25 PM UTC+8 HadoopMarc wrote:
Hi Roy,
I think I would first check whether the skew is absent if
you count the rows reading the HBase table directly from
spark (so, without using janusgraph), e.g.:
If this works all right, than you know that somehow in
janusgraph HBaseInputFormat the mappers do not get the right
key ranges to read from.
I also though about the storage.hbase.region-count property of
janusgraph-hbase. If you would specify this at 40 while
creating the graph, janusgraph-hbase would create many small
regions that will be compacted by HBase later on. But maybe
this creates a different structure in the row keys that can be
leveraged by the
hbase.mapreduce.tableinput.mappers.per.region.
Best wishes, Marc
Op woensdag 9 december
2020 om 17:16:35 UTC+1 schreef Roy Yu:
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be
effective. I set it to 40, and there are 40 tasks
processing every region. But here comes the new
promblem--the data skew. I use g.E().count() to count
all the edges of the graph. During counting one region,
one spark task containing all 2.6GB data, while other 39
tasks containing 0 data. The task failed again. I
checked my data. There are some vertices which have more
1 million incident edges. So I tried to solve this
promblem using vertex cut( https://docs.janusgraph.org/advanced-topics/partitioning/),
my graph schema is something like [ mgmt.makeVertexLabel('product').partition().make() ].
But when I using MR to load data to the new graph, it
consumed more than 10 times when the attemp without
using partition(), from the hbase
table detail page, I found the data loading process was
busy reading data from and writing data to the first
region. The first region became the hot spot. I guess it
relates to vertex ids. Could help me again?
On Tuesday, December
8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly
new janusgraph-hbase features. From the HBase
source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region"
config parameter.
It should not be too difficult to adapt the
janusgraph HBaseInputFormat to leverage this
feature (or maybe it even works without
change???).
Best wishes,
Marc
Op dinsdag 8
december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to
run on cloud infra that reduces your requested
40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls).
Fact of life.
---------------------
Sorry Marc I misled you. Error Message was
generated when I set spark.executor.memory to
30G, when it failed, I increased
spark.executor.memory to 40G, it failed either.
I felt desperate and come here to ask for help
On Tuesday,
December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu
wrote:
Hi Marc
Thanks for your immediate response.
I've tried to
set spark.yarn.executor.memoryOverhead=10G
and re-run the task, and it stilled failed.
From the spark task UI, I saw 80% of
processing time is Full GC time. As you
said, 2.6GB(GZ compressed) exploding is my
root cause. Now I'm trying to reduce my
region size to 1GB, if that will still fail,
I'm gonna config the hbase hfile not using
compressed format.
This was my first time running janusgraph
OLAP, and I think this is a common promblom,
as HBase region size 2.6GB(compressed) is
not large, 20GB is very common in our
production. If the community dose not solve
the promblem, the Janusgraph HBase based
OLAP solution cannot be adopted by other
companies either.
On
Tuesday, December 8, 2020 at 12:40:40 AM
UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things
bothering you here:
- you did not specify spark.yarn.executor.memoryOverhead,
as the exception message says.
Easily solved.
- you seem to run on cloud infra
that reduces your requested 40 Gb
to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls).
Fact of life.
- the janusgraph HBaseInputFormat
use sentire HBase regions as
hadoop partitions, which are fed
into spark tasks. The 2.6Gb region
size is for compressed binary data
which explodes when expanded into
java objects. This is your real
problem.
I did not follow the latest status of
janusgraph-hbase features for the
HBaseInputFormat, but you have to
somehow use spark with smaller
partitions than an entire HBase region.
A long time ago, I had success with
skipping the HBaseInputFormat and have
spark executors connect to JanusGraph
themselves. That is not a quick
solution, though.
Best wishes,
Marc
Op
maandag 7 december 2020 om 14:10:55
UTC+1 schreef Roy Yu:
Error
message:
ExecutorLostFailure (executor
1 exited caused by one of the
running tasks) Reason: Container
killed by YARN for exceeding
memory limits. 33.1 GB of 33 GB
physical memory used. Consider
boosting
spark.yarn.executor.memoryOverhead
or disabling
yarn.nodemanager.vmem-check-enabled
because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:MaxGCPauseMillis=500
-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3
Region info:
hdfs dfs -du -h
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67 134
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G 5.1 G
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$
Anybody who can
help me?
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/46bcc3bb-9e66-4fb1-add0-22374909fb63n%40googlegroups.com.
|
|
Thanks
Evgenii
toggle quoted message
Show quoted text
On Tuesday, December 15, 2020 at 8:24:11 PM UTC+8 yevg...@... wrote:
Oh, I recall that we once tried to debug the same issue with
JanusGraph-Hbase, had clear supernodes in the graph. No attempts
on repartitioning, including analyzing code of SparkGraphComputer
and tinkering around thought to make it work for partitioned
vertices etc. were successful - apparently using Cassandra (latest
3.x version at the time) didn't lead to OOM, but was noticeably
slower than HBase when we used it with smaller graphs.
Best regards,
Evgenii Ignatev.
On 15.12.2020 07:07, Roy Yu wrote:
Thanks Marc
On Friday, December 11, 2020
at 3:40:25 PM UTC+8 HadoopMarc wrote:
Hi Roy,
I think I would first check whether the skew is absent if
you count the rows reading the HBase table directly from
spark (so, without using janusgraph), e.g.:
If this works all right, than you know that somehow in
janusgraph HBaseInputFormat the mappers do not get the right
key ranges to read from.
I also though about the storage.hbase.region-count property of
janusgraph-hbase. If you would specify this at 40 while
creating the graph, janusgraph-hbase would create many small
regions that will be compacted by HBase later on. But maybe
this creates a different structure in the row keys that can be
leveraged by the
hbase.mapreduce.tableinput.mappers.per.region.
Best wishes, Marc
Op woensdag 9 december
2020 om 17:16:35 UTC+1 schreef Roy Yu:
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be
effective. I set it to 40, and there are 40 tasks
processing every region. But here comes the new
promblem--the data skew. I use g.E().count() to count
all the edges of the graph. During counting one region,
one spark task containing all 2.6GB data, while other 39
tasks containing 0 data. The task failed again. I
checked my data. There are some vertices which have more
1 million incident edges. So I tried to solve this
promblem using vertex cut( https://docs.janusgraph.org/advanced-topics/partitioning/),
my graph schema is something like [ mgmt.makeVertexLabel('product').partition().make() ].
But when I using MR to load data to the new graph, it
consumed more than 10 times when the attemp without
using partition(), from the hbase
table detail page, I found the data loading process was
busy reading data from and writing data to the first
region. The first region became the hot spot. I guess it
relates to vertex ids. Could help me again?
On Tuesday, December
8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly
new janusgraph-hbase features. From the HBase
source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region"
config parameter.
It should not be too difficult to adapt the
janusgraph HBaseInputFormat to leverage this
feature (or maybe it even works without
change???).
Best wishes,
Marc
Op dinsdag 8
december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to
run on cloud infra that reduces your requested
40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls).
Fact of life.
---------------------
Sorry Marc I misled you. Error Message was
generated when I set spark.executor.memory to
30G, when it failed, I increased
spark.executor.memory to 40G, it failed either.
I felt desperate and come here to ask for help
On Tuesday,
December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu
wrote:
Hi Marc
Thanks for your immediate response.
I've tried to
set spark.yarn.executor.memoryOverhead=10G
and re-run the task, and it stilled failed.
From the spark task UI, I saw 80% of
processing time is Full GC time. As you
said, 2.6GB(GZ compressed) exploding is my
root cause. Now I'm trying to reduce my
region size to 1GB, if that will still fail,
I'm gonna config the hbase hfile not using
compressed format.
This was my first time running janusgraph
OLAP, and I think this is a common promblom,
as HBase region size 2.6GB(compressed) is
not large, 20GB is very common in our
production. If the community dose not solve
the promblem, the Janusgraph HBase based
OLAP solution cannot be adopted by other
companies either.
On
Tuesday, December 8, 2020 at 12:40:40 AM
UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things
bothering you here:
- you did not specify spark.yarn.executor.memoryOverhead,
as the exception message says.
Easily solved.
- you seem to run on cloud infra
that reduces your requested 40 Gb
to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls).
Fact of life.
- the janusgraph HBaseInputFormat
use sentire HBase regions as
hadoop partitions, which are fed
into spark tasks. The 2.6Gb region
size is for compressed binary data
which explodes when expanded into
java objects. This is your real
problem.
I did not follow the latest status of
janusgraph-hbase features for the
HBaseInputFormat, but you have to
somehow use spark with smaller
partitions than an entire HBase region.
A long time ago, I had success with
skipping the HBaseInputFormat and have
spark executors connect to JanusGraph
themselves. That is not a quick
solution, though.
Best wishes,
Marc
Op
maandag 7 december 2020 om 14:10:55
UTC+1 schreef Roy Yu:
Error
message:
ExecutorLostFailure (executor
1 exited caused by one of the
running tasks) Reason: Container
killed by YARN for exceeding
memory limits. 33.1 GB of 33 GB
physical memory used. Consider
boosting
spark.yarn.executor.memoryOverhead
or disabling
yarn.nodemanager.vmem-check-enabled
because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:MaxGCPauseMillis=500
-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3
Region info:
hdfs dfs -du -h
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67 134
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G 5.1 G
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$
Anybody who can
help me?
|
|