Dealing with org.apache.thrift.transport.TTransportException


Nigel Brown <nigel...@...>
 

The below was a problem in titan and is still in janus, under some circumstances

3:18:37 WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local1757555819_0001
java
.lang.Exception: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (19784570) larger than max length (15728640)!
 at org
.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (19784570) larger than max length (15728640)!


I can up the max frame size in the config

storage.cassandra.frame-size-mb=50

To get over the problem, but this is only pushing the problem further out. Also, having massive frame sizes can send cassandra into a gc loop which makes it grind to a halt.

To make janus scalable, I need a solution that does not depend on an arbitrary limit.

Two solutions spring to mind
1. Fragment and reassemble, like other transport (e.g. tcp)
2. Node partitioning, where large rows in cassandra are split

I haven't really looked at the janus code yet, and both of these approaches may be entirely loopy, so the questions here are

1. Are there any comments on either approach?
2. Is anyone doing any similar work?
3. Are there other other approaches that circumvent this problem? (would this still be a problem with a CQL driver?)


Many thanks for any comments.