One thing is not yet clear to me: does your graph fit into a single node (regarding memory and GPU) or do you plan to use distributed pytorch? Either way, I guess it would be most efficient to use a two step process:
- get all data from janusgraph and store it on disk in a suitable format
- run pytorch geometric (may be in a distributed way) from the files on disk
JanusGraph only supports the hadoop InputFormats to retrieve graph data in a distributed way. Some teams succeeded in retrieving data from partitions from the janusgraph storage backends (not using any janusgraph API, see here)
, which could be done in a custom pytorch loader, but this is not documented (yet).
Cool that you apply janusgraph to this use case, so do not hesitate to ask for more details!