Nodes with lots of edges


Joe Obernberger
 

I've noticed that the max partition size on Cassandra can get extremely large if you have a node with lots of edges.  The max partition size on the edgestore table on a graph I'm working on is over 1GByte in size.  Cassandra's rule of thumb is no partition larger than 100MBytes.
Is there a way around this problem?

nodetool tablehistograms graphsource.edgestore
graphsource/edgestore histograms
Percentile      Read Latency     Write Latency SSTables Partition Size        Cell Count
                    (micros) (micros) (bytes)
50%                 12108.97             17.08 6.00 770                 8
75%                 17436.92             24.60 6.00 1109                10
95%                 17436.92             42.51 6.00 9887                42
98%                 20924.30            315.85 6.00 9887                42
99%                 20924.30            379.02 6.00 9887                42
Min                    73.46              3.97 1.00 125                 0
Max                268650.95           5839.59 6.00 1155149911           4866323

Thank you!

-Joe

--
This email has been checked for viruses by AVG.
https://www.avg.com


hadoopmarc@...
 

Hi Joe,

You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see:
https://docs.janusgraph.org/advanced-topics/partitioning/#vertex-cut

Marc


Joe Obernberger
 

Hi Marc - yes, it takes minutes to do queries on nodes with lots of edges.  Like:

:> g.V().has("somevar","someVal").outE().has("indexedField","value")

I believe this is because of the large partition size.  I would love to use vertex cutting; but there seems to be a problem with it:

-----

Every-time, I built a small graph and exported to graphML for viewing in Gephi I would have node IDs that only existed in the edges list.
I printed the nodeID in my code everywhere it was used, and I would never see it in the output, but the graphML had it in the edges list and those 'zombie nodes' did exist in the graph as confirmed by gremlin queries. This was happening because I was using:
VertexLabel sourceLabel = mgmt.makeVertexLabel("source").partition().make(); Once I removed partition, the "zombie" node IDs disappeared.  I wanted to use partition for that since those particular vertexes can have a lot of edges; potentially billions. 

-----

Is there a bug with vertex cutting?  Thank you!

-Joe

On 7/8/2022 1:44 AM, hadoopmarc@... wrote:
Hi Joe,

You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see:
https://docs.janusgraph.org/advanced-topics/partitioning/#vertex-cut

Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Matthew Nguyen
 

Saw the same thing awhile back.  Boxuan put in a Jira for it: https://github.com/JanusGraph/janusgraph/issues/2966


-----Original Message-----
From: Joe Obernberger <joseph.obernberger@...>
To: janusgraph-users@...
Sent: Mon, Jul 11, 2022 7:32 am
Subject: Re: [janusgraph-users] Nodes with lots of edges

Hi Marc - yes, it takes minutes to do queries on nodes with lots of edges.  Like:
:> g.V().has("somevar","someVal").outE().has("indexedField","value")
I believe this is because of the large partition size.  I would love to use vertex cutting; but there seems to be a problem with it:
-----
Every-time, I built a small graph and exported to graphML for viewing in Gephi I would have node IDs that only existed in the edges list.
I printed the nodeID in my code everywhere it was used, and I would never see it in the output, but the graphML had it in the edges list and those 'zombie nodes' did exist in the graph as confirmed by gremlin queries. This was happening because I was using:
VertexLabel sourceLabel = mgmt.makeVertexLabel("source").partition().make(); Once I removed partition, the "zombie" node IDs disappeared.  I wanted to use partition for that since those particular vertexes can have a lot of edges; potentially billions. 
-----
Is there a bug with vertex cutting?  Thank you!
-Joe
On 7/8/2022 1:44 AM, hadoopmarc@... wrote:
Hi Joe,

You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see:
https://docs.janusgraph.org/advanced-topics/partitioning/#vertex-cut

Marc



AVG logo
This email has been checked for viruses by AVG antivirus software.
www.avg.com