Re: MapReduce reindexing with authentication
hadoopmarc@...
Hi Boxuan,
Yes, you are right, I mixed things up by wrongly interpreting GENERIC_OPTIONS as an env variable. I did some additional experiments. though, bringing in new information. 1. It is possible to put a mapred-site.xml file on the JanusGraph classpath that is automatically loaded by the mapreduce client. When using the file below during mapreduce reindexing, I get the following exception (on purpose): gremlin> mr.updateIndex(i, SchemaAction.REINDEX).get() java.io.FileNotFoundException: File file:/tera/lib/janusgraph-full-0.5.3/hi.tgz does not exist The mapreduce config parameters are listed in https://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml The description for mapreduce.application.framework.path suggests that you can pass additional files to the mapreduce workers using this option (without any changes to JanusGraph). <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>local</value> </property> <property> <name>mapreduce.application.classpath</name> <value>dummy</value> </property> <property> <name>mapreduce.application.framework.path</name> <value>hi.tgz</value> </property> <property> <name>mapred.map.tasks</name> <value>2</value> </property> <property> <name>mapred.reduce.tasks</name> <value>2</value> </property> </configuration> 2. When using mapreduce reindexing in the documented way, it already issues the following warning: 08:49:55 WARN org.apache.hadoop.mapreduce.JobResourceUploader - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. When you would resolve your keytab issue by modifying the JanusGraph code and calling the hadoop ToolRunner, you have the additional advantage of getting rid of this warning. This would not work from the gremlin console, though, unless gremlin.sh passes the additional command line options to the java command line (ugly). So, I think I would prefer the option with mapred-site.xml. It would not hurt to slightly extend the mapreduce reindexing documentation, anyway:
|
|