Write vertex program output to HDFS using SparkGraphComputer


Hi All,

I am running SparkGraphComputer using tinker-pop library on Yarn. I am able to run vertex program successfully and write final output to a mount location. But I a want to make program to write to HDFS.
We are using Hadoop 3 and TinkerPop 3.6
On giving output path as hdfs location. I see Hadoop libraries tries to get host details from HDFS server DNS name and fails with unknown host exception. 
To fix that I had put a condition in hadoop library saying if host is HDFS DNS name then looks for a specific host and then its started connecting to HDFS FS.
With that connection happening to HDFS, output folder gets created in HDFS but it fails while trying to write with below exception
 java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS];
Could you please share thoughts/pointers to debug it? 


Hi Anjani,

Does hdfs work from the Gremlin Console? See the example at:

In particular, can you reproduce the "Using CloneVertexProgram" example from the last link?

What is the input to SparkGraphComputer in your usecase? I guess it is JanusGraph, being on the JanusGraph user list right now. However, JanusGraph ships with TinkerPop 3.5 while you mention TinkerPop 3.6 above.

Best wishes,     Marc


Thanks for response Mark. Yes i am using JanusGraph as input to SparkGraphComputer.  We had Hadoop3 which is not supported in tinkerpop 3.5 but as per documentation its supported by Tinkerpop 3.6, so trying to run it with 3.6 .

I will validate hadoop-gremlin via console.



Are you sure the JanusGraph hadoop-2.x clients cannot be used with your hadoop-3.x cluster? See: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Policy