Hi,
Jan Jansen (farodin91) and I took a first try at implementing a CQL input format for Spark to enable OLAP jobs to work without Thrift (#985). Unfortunately, we can't just add wrappers for the CQL OLAP support in
org.apache.cassandra and be done with it. The reason is that JanusGraph still uses version 2.1.20 of
org.apache.cassandra cassandra-all which is not compatible with the version of the DataStax Cassandra driver JanusGraph is using. So, we need to update the
org.apache.cassandra dependency to major version 3.
With this update we would however lose support for Thrift as that was thrown out completely in version 3.0.0 (see CASSANDRA-9353). That leaves us with the following options:
- Update the dependency and drop support for Thrift. Thrift is deprecated since Cassandra 3.0 which was released in November 2015. Users who want to continue using Thrift could stay on a JanusGraph version that still supports it.
- Create a separate project janusgraph-hadoop-cql that can use newer versions of these two dependencies without affecting the Thrift support in janusgraph-hadoop-core.
- Update the dependency and include the classes for Thrift support from org.apache.cassandra.hadoop package in janusgraph-hadoop-core.
The first option would of course be the easiest to implement and I think that dropping support for Thrift over 3 years after deprecation started and only in a new major release of JanusGraph would be acceptable. However, this comes with some risk as we would make CQL the only way to use OLAP with Cassandra in the same version in which we introduce the CQL OLAP support. So, if we find any problems with the implementation, then users could only downgrade to a lower version of JanusGraph that still supports Thrift as there wouldn’t be an alternative input format left.
We should probably update the dependency at some point irrespective of the support for CQL any way so creating a new project now just to not update the dependency in
janusgraph-hadoop-core probably isn’t the best idea.
That is why option 3 sounds like the best one to us. We would add support for OLAP with CQL without losing support for Thrift which we could drop at any time in the future.
Any thoughts or other opinions on this topic?