Issues while iterating over self-loop edges in Apache Spark
Mladen Marović
Hello, while debugging some Apache Spark jobs that process data from a Janusgraph graph. i noticed some issues with self-loop edges (edges that connect a vertex to itself). The data is read using:
When I try to process all outbound edges of a single vertex using:
and that vertex has multiple self-loop edges with the same edge label, the iterator always returns only one such edge. Edges that are not self-loop are all returned as expected. To give a specific example, if I have a vertex V0 with edges that E1, E2, E3, E4, E5 that lead to vertices V1, V2, V3, V4, V5, the call After further analysis, I came upon this commit: https://github.com/JanusGraph/janusgraph/commit/d3006dc939c1b640bb263806abd3fd6bee630d12 which explicitly added code that skips deserializing multiple self-loop edges. The code from the linked commit is still present in org.janusgraph:janusgraph-hadoop:0.5.3 and seems to be the cause of this unexpected behavior. My questions are as follows:
Kind regards, Mladen Marović |
|