Running OLAP on HBase with SparkGraphComputer fails on shuffle/Pregel message pass


Yevgeniy Ignatyev <yevgeniy...@...>
 

Hello.

Recently we faced an issue with running PageRank on HBase: for comparison purposes we loaded our graph from Cassandra to the HBase deployment of the same size and unlike on Cassandra - all attempts to run page rank on that graph fail with initial cause pointing to the SparkExecutor line 165 in spark-gremlin:

viewOutgoingRDD.flatMapToPair(messageFunction).reduceByKey(graphRDD.partitioner().get(), reducerFunction) :

It always happens with the message in logs that container requested more memory that allowed by its configuration, like:

Reason: Container killed by YARN for exceeding memory limits. 43.0 GB of 42 GB physical memory used.

According to logs error consistently seems to occur on the first message pass phase of vertex program - right after the initial iteration.

Here is one of configurations we tried to run OLAP on HBase, with the same Spark related properties we use to perform queries on Cassandra:

gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

storage
.backend=hbase
storage
.hbase.snapshot-name=jsnapshot

#
# Hadoop Graph Configuration
#
gremlin
.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseSnapshotInputFormat
gremlin
.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output

#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr
.ioformat.conf.storage.backend=hbase
janusgraphmr
.ioformat.conf.storage.hbase.table=janusgraph

#
# Spark Configuration
#
spark
.master=yarn
spark
.deploy.mode=cluster
spark
.executor.memory=12g
spark
.driver.memory=2g
spark
.executor.cores=4
spark
.executor.instances=12
spark
.serializer=org.apache.spark.serializer.KryoSerializer

We tried to increase memory per executor as much as we can and tweaking gremlin.spark.graphStorageLevel without any success.

Did anybody experience similar issues with running on SparkGraphComputer with HBaseInputFormat/HBaseSnapshotInputFormat or probably on other backends?

Best regards,
Evgeniy Ignatiev.


Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.