I have a JanusGraph Server (github master, gremlin 3.2.5) on top of Cassandra storage backend, to store users, items and "WHEN, WHERE, WHO bought WHAT ?" relations.
To get data from and modify data in the graph, I use Python aiogremlin driver-mode (== groovy sessionless eval mode) and it works well for now. Thanks developers !
So now, I have to compute recommendation and forecast item sales.
In order to data-cleaning, data-normalization, recommendation and forecasting, Because of a little big graph, I want to use higher-level pyspark tools (ex. DataFrame, ML) and python machine learning packages (ex, scikit-learn). But I can not find the way to load graph data into Spark. What I want is "connector" which can be used by pyspark to load data from JanusGraph, not SparkGraphComputer.
Could someone please how to do it ?
- Additional info
It seems OrientDB has some Spark connectors (though, I don't know these can be used by pyspark). But I want JanusGraph's one.
https://github.com/sbcd90/spark-orientdb
https://github.com/metreta/spark-orientdb-connector