Hi Marc - as usual you are on the right path. The number of
edges on the nodes in question was very high, so doing any sort of
query on it is slow. The query was timing out; not sure what that
error message means, but when I do the same query in gremlin, it
just runs and runs. Unfortunately, I'm ending up with lots of
nodes being super nodes in this graph.
The string size of the cID property is small.
Yes - the query fails with any limit - times out.
I believe the query will time out at this point:
traversal.V().has('source',source).outE("correlation").has('type',type).has('range',range)....
There is an index on type and range.
I've not modified the partitioning. I did try to use vertex cut,
but had some issues with nodeIDs that seemed to appear out of
nowhere - ie they were never created, but appeared in the edges
list. It was odd.
VertexLabel sourceLabel = mgmt.makeVertexLabel("source").partition().make();
Looking at cassandra, there are some very large partitions:
nodetool tablehistograms
graphsource.edgestore
graphsource/edgestore histograms
Percentile Read Latency Write Latency
SSTables Partition Size Cell Count
(micros)
(micros) (bytes)
50% 1131.75 20.50
10.00 372 5
75% 1358.10 29.52
10.00 372 5
95% 1955.67 51.01
10.00 535 8
98% 2816.16 315.85
10.00 924 10
99% 4055.27 379.02
10.00 1331 12
Min 105.78 2.76
1.00 51 0
Max 89970.66 36157.19
14.00 4139110981 30130992
nodetool tablehistograms
graphsource.graphindex
graphsource/graphindex histograms
Percentile Read Latency Write Latency
SSTables Partition Size Cell Count
(micros)
(micros) (bytes)
50% 182.79 20.50
0.00 124 1
75% 545.79 29.52
4.00 149 1
95% 943.13 126.93
8.00 149 1
98% 1358.10 219.34
8.00 179 1
99% 1955.67 263.21
8.00 215 1
Min 35.43 2.30
0.00 36 0
Max 12108.97 20924.30
10.00 1386179893 36157190
-Joe
[Edited Message Follows]
Hi Joe,
I have no detailed knowledge of the JanusGraph backend code
myself, but just a reaction for clarification (so that others see
more hints to the cause of the issue):
- Is it possible that the value of the cID property is very
large (e.g. because it is an array/multiproperty)?
- Does the query also fail with limit(1), limit(10), etc.?
- Does the query also fail with the .values("cID") replaced by
.id()?
- Did you do anything special with partitioning of the graph
(preferably attach the graph properties file)?
Best wishes, Marc