Running OLAP on HBase with SparkGraphComputer fails on shuffle/Pregel message pass
Yevgeniy Ignatyev <yevgeniy...@...>
Hello.
Recently we faced an issue with running PageRank on HBase: for comparison purposes we loaded our graph from Cassandra to the HBase deployment of the same size and unlike on Cassandra - all attempts to run page rank on that graph fail with initial cause pointing to the SparkExecutor line 165 in spark-gremlin:
viewOutgoingRDD.flatMapToPair(messageFunction).reduceByKey(graphRDD.partitioner().get(), reducerFunction) :
It always happens with the message in logs that container requested more memory that allowed by its configuration, like:
Reason: Container killed by YARN for exceeding memory limits. 43.0 GB of 42 GB physical memory used.
According to logs error consistently seems to occur on the first message pass phase of vertex program - right after the initial iteration.
Here is one of configurations we tried to run OLAP on HBase, with the same Spark related properties we use to perform queries on Cassandra:
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer
storage.backend=hbase
storage.hbase.snapshot-name=jsnapshot
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseSnapshotInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hbase.table=janusgraph
#
# Spark Configuration
#
spark.master=yarn
spark.deploy.mode=cluster
spark.executor.memory=12g
spark.driver.memory=2g
spark.executor.cores=4
spark.executor.instances=12
spark.serializer=org.apache.spark.serializer.KryoSerializer
We tried to increase memory per executor as much as we can and tweaking gremlin.spark.graphStorageLevel without any success.
Did anybody experience similar issues with running on SparkGraphComputer with HBaseInputFormat/HBaseSnapshotInputFormat or probably on other backends?
Best regards,
Evgeniy Ignatiev.