GC overhead limit exceeded while removing Index using mapreduce

Shiva Krishnan <shivain...@...>


I'm getting the below error while removing a graph index using mapreduce.

2020-05-17 13:29:12,729 INFO [main-SendThread(edcdeployment008.informatica.com:2181)] org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /, server: edcdeployment008.informatica.com/
2020-05-17 13:30:12,358 ERROR [main-SendThread(edcdeployment006.informatica.com:2181)] org.apache.zookeeper.ClientCnxn: from main-SendThread(edcdeployment006.informatica.com:2181)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-05-17 13:31:08,162 WARN [main-SendThread(edcdeployment008.informatica.com:2181)] org.apache.zookeeper.ClientCnxn: Session 0x37217082f991cd5 for server edcdeployment008.informatica.com/, unexpected error, closing socket connection and attempting reconnect
java.lang.OutOfMemoryError: GC overhead limit exceeded
2020-05-17 13:31:15,597 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

On analyzing the heap dump, we found the mapper has 154 Results loaded. Most of them are very small except one row. There’s 10+ million cells in the row. The single result is taking 3.5GB of memory. I have tried increasing the heap memory to 15GB but it didn't helped. (Attached the heap dump snapshot below)

Is there any way to fix this issue other than increasing heap memory.