Note: lists.lfaidata.foundation will be down for maintenance on Monday, September 26th, starting at 9AM Pacific Time (4PM Monday September 26, 2022 UTC), for approximately one hour.
- Running OLAP on HBase with SparkGraphComputer fails on shuffle/Pregel message pass
Re: Running OLAP on HBase with SparkGraphComputer fails on shuffle/Pregel message pass
I have the same promblem, have you ever solved it?
toggle quoted messageShow quoted text
On Wednesday, May 30, 2018 at 5:30:14 PM UTC+8 yevg...@... wrote:
Recently we faced an issue with running PageRank on HBase: for comparison purposes we loaded our graph from Cassandra to the HBase deployment of the same size and unlike on Cassandra - all attempts to run page rank on that graph fail with initial cause pointing to the SparkExecutor line 165 in spark-gremlin:
viewOutgoingRDD.flatMapToPair(messageFunction).reduceByKey(graphRDD.partitioner().get(), reducerFunction) :
It always happens with the message in logs that container requested more memory that allowed by its configuration, like:
Reason: Container killed by YARN for exceeding memory limits. 43.0 GB of 42 GB physical memory used.
According to logs error consistently seems to occur on the first message pass phase of vertex program - right after the initial iteration.
Here is one of configurations we tried to run OLAP on HBase, with the same Spark related properties we use to perform queries on Cassandra:
# Hadoop Graph Configuration
# JanusGraph HBase InputFormat configuration
# Spark Configuration
We tried to increase memory per executor as much as we can and tweaking gremlin.spark.graphStorageLevel
without any success.
Did anybody experience similar issues with running on SparkGraphComputer with HBaseInputFormat/HBaseSnapshotInputFormat or probably on other backends?
Join email@example.com to automatically receive all group messages.