Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits
Roy Yu <7604...@...>
Hi Marc,
toggle quoted message
Show quoted text
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be effective. I set it to 40, and there are 40 tasks processing every region. But here comes the new promblem--the data skew. I use g.E().count() to count all the edges of the graph. During counting one region, one spark task containing all 2.6GB data, while other 39 tasks containing 0 data. The task failed again. I checked my data. There are some vertices which have more 1 million incident edges. So I tried to solve this promblem using vertex cut(https://docs.janusgraph.org/advanced-topics/partitioning/), my graph schema is something like [mgmt.makeVertexLabel('product').partition().make() ]. But when I using MR to load data to the new graph, it consumed more than 10 times when the attemp without using partition(), from the hbase table detail page, I found the data loading process was busy reading data from and writing data to the first region. The first region became the hot spot. I guess it relates to vertex ids. Could help me again?
On Tuesday, December 8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
|
|