Date
1 - 3 of 3
Support for Partitioned Vertices in JanusGraphHadoop for OLAP queries
kestin...@...
Hey there,
I have been working with JanusGraph recently and unfortunately the dataset that I am dealing with is susceptible to supernodes (10+ mil edges into a single vertex). It seems that partitioning vertices with a particular vertex labels is the way to distribute these dense vertices in the storage backend: https://docs.janusgraph.org/latest/graph-partitioning.html but I see that these partitioned vertices must be filtered out for OLAP queries: https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-hadoop-parent/janusgraph-hadoop-core/src/main/java/org/janusgraph/hadoop/config/JanusGraphHadoopConfiguration.java#L51-L57.
Are there any plans to remove this restriction anytime soon/is there anyone currently working on this problem?
Thanks,
Joseph
Florian Hockmann <f...@...>
I'm not aware of anyone working on this right now, but supernodes are definitely a big problem for graph databases, including JanusGraph, so any improvements in that area would be a great help for many users.
Regarding graph partitioning as a countermeasure for supernodes, I just want to point out that it depends on the size of your cluster of storage backend nodes how much it helps. This blog post by Ted Wilmes explains in greater detail. It talks about DSE Graph, but the same basically applies to JanusGraph.
So, you might need to implement something yourself to work around the supernode problem, like some bucketing approach where you split your supernodes up. If you want more information about supernodes and the impact they have on JanusGraph, we had a thread a while back on janusgraph-users on that topic.Am Freitag, 2. August 2019 19:48:25 UTC+2 schrieb kes...@...:
Hey there,I have been working with JanusGraph recently and unfortunately the dataset that I am dealing with is susceptible to supernodes (10+ mil edges into a single vertex). It seems that partitioning vertices with a particular vertex labels is the way to distribute these dense vertices in the storage backend: https://docs.janusgraph.org/latest/graph- partitioning.html but I see that these partitioned vertices must be filtered out for OLAP queries: https://github.com/ JanusGraph/janusgraph/blob/ master/janusgraph-hadoop- parent/janusgraph-hadoop-core/ src/main/java/org/janusgraph/ hadoop/config/ JanusGraphHadoopConfiguration. java#L51-L57. Are there any plans to remove this restriction anytime soon/is there anyone currently working on this problem?Thanks,Joseph
kestin...@...
Thanks for your reply, I'm currently weighing the option of simply bucketing by time vs. partitioning and then implementing partitioned vertex support in OLAP. We are using an HBase backend via Bigtable so partitioning would help prevent supernodes from overrunning the Bigtable row size limits.
On Monday, August 5, 2019 at 4:10:02 AM UTC-7, Florian Hockmann wrote:
I'm not aware of anyone working on this right now, but supernodes are definitely a big problem for graph databases, including JanusGraph, so any improvements in that area would be a great help for many users.Regarding graph partitioning as a countermeasure for supernodes, I just want to point out that it depends on the size of your cluster of storage backend nodes how much it helps. This blog post by Ted Wilmes explains in greater detail. It talks about DSE Graph, but the same basically applies to JanusGraph.So, you might need to implement something yourself to work around the supernode problem, like some bucketing approach where you split your supernodes up. If you want more information about supernodes and the impact they have on JanusGraph, we had a thread a while back on janusgraph-users on that topic.Am Freitag, 2. August 2019 19:48:25 UTC+2 schrieb kes...@...:Hey there,I have been working with JanusGraph recently and unfortunately the dataset that I am dealing with is susceptible to supernodes (10+ mil edges into a single vertex). It seems that partitioning vertices with a particular vertex labels is the way to distribute these dense vertices in the storage backend: https://docs.janusgraph.org/latest/graph- partitioning.html but I see that these partitioned vertices must be filtered out for OLAP queries: https://github.com/ JanusGraph/janusgraph/blob/ master/janusgraph-hadoop- parent/janusgraph-hadoop-core/ src/main/java/org/janusgraph/ hadoop/config/ JanusGraphHadoopConfiguration. java#L51-L57. Are there any plans to remove this restriction anytime soon/is there anyone currently working on this problem?Thanks,Joseph