Re: Heartbeat error when using BulkUploader


ste...@...
 

Also, the longest line in the adjacency list is: 17,179,092 characters long. That equates to about 128,000 edges for that particular vertex.


On Tuesday, July 24, 2018 at 11:37:07 AM UTC-7, s...@... wrote:

It happened again, so I've included a screenshot of the error:



On Monday, July 23, 2018 at 11:13:14 PM UTC-7, Debasish Kanhar wrote:
Hi,

Can you share full stack trace or better logs? Maybe that will give us more clarity why you are facing this error. This can be due to any number of reasons. I also remember getting Heartbeat timed out even when my connection to backend Cassandra was failing.

Also, maybe your configuration you are specifying will help.

On Tuesday, 24 July 2018 10:17:12 UTC+5:30, s...@... wrote:
I'm consistently having issues using the bulkUploadVertexProgram to load a very large (800GB) graph into Janusgraph. I have it split up into 110 files (time-based edge separation) so the files are all about 8gb, and I am iterating over the files one by one to upload. I haven't been able to get past the first file. I have 64 cores with spark executer and worker memory of 6g each (416GB on the machine). I've tried a number of different configurations to no avail. What is really killing productivity is how long it takes for the system to fail (about 5 hours), which makes it hard to iterate on debugging. The latest error I'm having is:



[Stage 5:==03:13:44 WARN  org.apache.spark.rpc.netty.NettyRpcEnv  - Ignored message: HeartbeatResponse(false)
03:14:31 WARN  org.apache.spark.rpc.netty.NettyRpcEndpointRef  - Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@3f7a6c8,BlockManagerId(driver, localhost, 40607))] in 1 attempts
org
.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval


This is an urgent issue for me and any help is greatly appreciated.

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.