Re: Required Capacity Error - JanusGraph on Cassandra


Joe Obernberger
 

Hi Marc - as usual you are on the right path.  The number of edges on the nodes in question was very high, so doing any sort of query on it is slow.  The query was timing out; not sure what that error message means, but when I do the same query in gremlin, it just runs and runs.  Unfortunately, I'm ending up with lots of nodes being super nodes in this graph.

The string size of the cID property is small.
Yes - the query fails with any limit - times out.
I believe the query will time out at this point:  traversal.V().has('source',source).outE("correlation").has('type',type).has('range',range)....
There is an index on type and range.

I've not modified the partitioning.  I did try to use vertex cut, but had some issues with nodeIDs that seemed to appear out of nowhere - ie they were never created, but appeared in the edges list.  It was odd.
VertexLabel sourceLabel = mgmt.makeVertexLabel("source").partition().make();

Looking at cassandra, there are some very large partitions:
 nodetool tablehistograms graphsource.edgestore
graphsource/edgestore histograms
Percentile      Read Latency     Write Latency          SSTables    Partition Size        Cell Count
                    (micros)          (micros)                             (bytes)
50%                  1131.75             20.50             10.00               372                 5
75%                  1358.10             29.52             10.00               372                 5
95%                  1955.67             51.01             10.00               535                 8
98%                  2816.16            315.85             10.00               924                10
99%                  4055.27            379.02             10.00              1331                12
Min                   105.78              2.76              1.00                51                 0
Max                 89970.66          36157.19             14.00        4139110981          30130992

nodetool tablehistograms graphsource.graphindex
graphsource/graphindex histograms
Percentile      Read Latency     Write Latency          SSTables    Partition Size        Cell Count
                    (micros)          (micros)                             (bytes)
50%                   182.79             20.50              0.00               124                 1
75%                   545.79             29.52              4.00               149                 1
95%                   943.13            126.93              8.00               149                 1
98%                  1358.10            219.34              8.00               179                 1
99%                  1955.67            263.21              8.00               215                 1
Min                    35.43              2.30              0.00                36                 0
Max                 12108.97          20924.30             10.00        1386179893          36157190

-Joe

On 9/15/2022 3:59 AM, hadoopmarc@... wrote:

[Edited Message Follows]

Hi Joe,

I have no detailed knowledge of the JanusGraph backend code myself, but just a reaction for clarification (so that others see more hints to the cause of the issue):
  • Is it possible that the value of the cID property is very large (e.g. because it is an array/multiproperty)?
  • Does the query also fail with limit(1), limit(10), etc.?
  • Does the query also fail with the .values("cID") replaced by .id()?
  • Did you do anything special with partitioning of the graph (preferably attach the graph properties file)?

Best wishes,   Marc

Virus-free.www.avg.com

Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.