MapReduce reindexing with authentication

Boxuan Li

We have been using a yarn cluster to run MapReduce reindexing (Cassandra + Elasticsearch) for a long time. Recently, we introduced Kerberos-based authentication to the Elasticsearch cluster, meaning that worker nodes need to authenticate via a keytab file.

We managed to achieve so by using a hadoop command to include the keytab file when submitting the MapReduce job. Hadoop automatically copies this file and distributes it to the working directory of all worker nodes. This works well for us, except that we have to make changes to MapReduceIndexManagement class so that it accepts an org.apache.hadoop.conf.Configuration object (which is created by org.apache.hadoop.util.ToolRunner) rather than instantiate one by itself. We are happy to submit a PR for this, but I would like to hear if there is any better way of handling this.


Join to automatically receive all group messages.