Nodes with lots of edges
Joe Obernberger
I've noticed that the max partition size on Cassandra can get extremely large if you have a node with lots of edges. The max partition size on the edgestore table on a graph I'm working on is over 1GByte in size. Cassandra's rule of thumb is no partition larger than 100MBytes.
Is there a way around this problem? nodetool tablehistograms graphsource.edgestore graphsource/edgestore histograms Percentile Read Latency Write Latency SSTables Partition Size Cell Count (micros) (micros) (bytes) 50% 12108.97 17.08 6.00 770 8 75% 17436.92 24.60 6.00 1109 10 95% 17436.92 42.51 6.00 9887 42 98% 20924.30 315.85 6.00 9887 42 99% 20924.30 379.02 6.00 9887 42 Min 73.46 3.97 1.00 125 0 Max 268650.95 5839.59 6.00 1155149911 4866323 Thank you! -Joe -- This email has been checked for viruses by AVG. https://www.avg.com |
|
hadoopmarc@...
Hi Joe,
You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see: https://docs.janusgraph.org/advanced-topics/partitioning/#vertex-cut Marc |
|
Joe Obernberger
Hi Marc - yes, it takes minutes to do queries on nodes with lots of edges. Like: :> g.V().has("somevar","someVal").outE().has("indexedField","value") I believe this is because of the large partition size. I would
love to use vertex cutting; but there seems to be a problem with
it: ----- Every-time, I built a small graph and exported to graphML for
viewing in Gephi I would have node IDs that only existed in the
edges list. ----- Is there a bug with vertex cutting? Thank you! -Joe On 7/8/2022 1:44 AM,
hadoopmarc@... wrote:
Hi Joe, |
|
Matthew Nguyen <nguyenm9@...>
Saw the same thing awhile back. Boxuan put in a Jira for it: https://github.com/JanusGraph/janusgraph/issues/2966 -----Original Message-----
From: Joe Obernberger <joseph.obernberger@...> To: janusgraph-users@... Sent: Mon, Jul 11, 2022 7:32 am Subject: Re: [janusgraph-users] Nodes with lots of edges Hi Marc - yes, it takes minutes to do queries on nodes with lots
of edges. Like:
:>
g.V().has("somevar","someVal").outE().has("indexedField","value")
I believe this is because of the large partition size. I would
love to use vertex cutting; but there seems to be a problem with
it:
-----
Every-time, I built a small graph and exported to graphML for
viewing in Gephi I would have node IDs that only existed in the
edges list.
I printed the nodeID in my code everywhere it was used, and I would never see it in the output, but the graphML had it in the edges list and those 'zombie nodes' did exist in the graph as confirmed by gremlin queries. This was happening because I was using: VertexLabel sourceLabel = mgmt.makeVertexLabel("source").partition().make(); Once I removed partition, the "zombie" node IDs disappeared. I wanted to use partition for that since those particular vertexes can have a lot of edges; potentially billions. -----
Is there a bug with vertex cutting? Thank you!
-Joe
On 7/8/2022 1:44 AM,
hadoopmarc@... wrote:
Hi Joe,
You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see: https://docs.janusgraph.org/advanced-topics/partitioning/#vertex-cut Marc |
|