Cross instance communication in graphs with custom partitioning
Alain Rodriguez <al...@...>
The JanusGraph documentation suggests that using custom partitioning schemes can reduce cross-instance communication by placing vertices that are frequently traversed together in the same instance.
However, random partitioning results in less efficient query processing as the JanusGraph cluster grows to accommodate more graph data because of the increasing cross-instance communication required to retrieve the query’s result set.
Can someone clarify what is meant by instance in this context? I am assuming it refers to storage backend instance, eg cross-Cassandra instance communication.
What about placing closely related entities in the same C* machine makes query traversal faster? Does Janus read and cache full ranges of rows from Cassandra at a time, thus increasing the probability that the nearby vertices are preloaded into the cache and used in subsequent iteration steps? Is Janus counting on C* to load and keep in cache a nearby-block of rows?
I was under the impression that Janus executes queries in an iterative fashion, thus issuing one network request to C* per traversal hop.
Any clarifications much appreciated!