Re: MapReduce reindexing with authentication


Boxuan Li
 

Hi Marc,

Thanks for your explanation. Just to avoid confusion, GENERIC_OPTIONS itself is not an env variable, but a set of configuration options (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options). These options have nothing to do with environment variables.

If I understand you correctly, you are saying that maybe ToolRunner interface is not required to submit files. I didn’t try but I think you are right, because what it does under the hood is simply

if (line.hasOption("files")) {
conf.set("tmpfiles", this.validateFiles(line.getOptionValue("files"), conf), "from -files command line option");
}
Which will later be picked up by hadoop client. So, theoretically, ToolRunner is not needed, and one can set hadoop config by themselves. This, however, does not seem to be documented officially anywhere, and it is not guaranteed that this string literal “tmpfiles” will not change in future versions.

Note that even if one wants to set “tmpfiles” by themselves for MapReduce reindex, they still need to modify JanusGraph source code because currently hadoopConf object is created within MapReduceIndexManagement class and users have no control over it.

Best regards,
Boxuan

On May 15, 2021, at 4:53 PM, hadoopmarc@... wrote:

Hi Boxuan,

Yes, I did not finish my argument. What I tried to suggest: if the hadoop CLI command checks the GENERIC_OPTIONS env variable, then maybe also the mapreduce java client called by JanusGraph checks the GENERIC_OPTIONS env variable.

The (old) blog below suggests, however, that this behavior is not present by default but requires the janusgraph code to run hadoop's ToolRunner. So, just see if this is any better than what you had in mind to implement.
https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver/

Best wishes,    Marc

Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.