Storing and reading connected component RDD through OutputFormatRDD & InputFormatRDD


anjanisingh22@...
 
Edited

Hi All,

I am using connected component vertex program to find all the connected nodes in graph and then using that RDD for further processing in graph. I want to store that RDD at some output location so that i can re-use the RDD and don't have to run connected component vertex program which is time consuming. 

I see in tinker-pop library we have OutputFormatRDD  to save data. I tired

outputFormatRDD.writeGraphRDD(graphComputerConfiguration, uniqueRDD);  ## connected but its throwing class cast exception as connected component vertex program output RDD value is a list which can not be cast to VertexWritable

 

 outputFormatRDD.writeMemoryRDD(graphComputerConfiguration, "memoryKey",  uniqueRDD);  ## Its saving RDD by creating memory key folder name at output location.


Not able to read RDD through InputFormatRDD.readMemoryRDD() as its looking for data files as per class SequenceFileInputFormat class. 

Am i missing any thing? Please let me know if you have tired some 
these methods? Want to check if we can use out of box methods before proceeding with our own?

Thanks,
Anjani





hadoopmarc@...
 

Hi Anjani,

The following section of the TinkerPop ref docs gives an example of how to reuse the output RDD of one job in a follow-up gremlin OLAP job.
https://tinkerpop.apache.org/docs/3.4.10/reference/#interacting-with-spark

Best wishes,   Marc