Re: Getting Edges - Performance
One thing of note is the tablehistogram for the graphindex table:
nodetool tablehistograms
graphsource.graphindex
graphsource/graphindex histograms
Percentile Read Latency Write Latency
SSTables Partition Size Cell Count
(micros)
(micros) (bytes)
50% 152.32 35.43
0.00 179 1
75% 545.79 51.01
2.00 179 1
95% 943.13 61.21
4.00 215 1
98% 1131.75 73.46
5.00 215 1
99% 1131.75 88.15
5.00 215 1
Min 42.51 11.87
0.00 36 0
Max 1131.75 88.15
5.00 802187438 25109160
Stats:
nodetool tablestats graphsource.graphindex
Total number of tables: 95
----------------
Keyspace : graphsource
Read Count: 28106793
Read Latency: 2.0613708762860283 ms
Write Count: 54754221
Write Latency: 0.089838214609975 ms
Pending Flushes: 0
Table: graphindex
SSTable count: 25
Old SSTable count: 0
Space used (live): 2858126352
Space used (total): 2858126352
Space used by snapshots (total): 2032016596
Off heap memory used (total): 22737395
SSTable Compression Ratio: 0.4225597463117299
Number of partitions (estimate): 10030080
Memtable cell count: 6363776
Memtable data size: 217020150
Memtable off heap memory used: 0
Memtable switch count: 77
Local read count: 11659961
Local read latency: 0.315 ms
Local write count: 41850997
Local write latency: 0.039 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 3.628GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 40827
Bloom filter false ratio: 0.00000
Bloom filter space used: 12591976
Bloom filter off heap memory used: 12591776
Index summary off heap memory used: 9669843
Compression metadata off heap memory used:
475776
Compacted partition minimum bytes: 36
Compacted partition maximum bytes: 802187438
Compacted partition mean bytes: 423
Average live cells per slice (last five
minutes): 1.0
Maximum live cells per slice (last five
minutes): 1
Average tombstones per slice (last five
minutes): 1.0
Maximum tombstones per slice (last five
minutes): 1
Dropped Mutations: 0
Droppable tombstone ratio: 0.00000
It looks like the index is not well partitioned?
-Joe
Thank you Boxuan - the code (REST service) that is modifying the graph is being called continuously when running. A slow example looks like this:
Metrics: Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[edgeID.eq(DS_106|geotime|hou... 1 1 1393.727 100.00
constructGraphCentricQuery 0.011
GraphCentricQuery 1393.698
\_condition=(edgeID = DS_106|geotime|hour 1km|2022-6-14T4|18SVK 86 22|Date,Actor2Geo_Long_Actor2Geo_Lat)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]
\_index=byEdgeIDComposite
backend-query 1 1393.128
\_query=byEdgeIDComposite:multiKSQ[1]
>TOTAL - - 1393.727 -
Code is:
System.out.println("Metrics: "+traversal.E().has("edgeID", edgeID).profile().next().toString());
dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();If the time to run that exceeds 500mSec, I print the message.
-Joe
On 6/23/2022 4:15 PM, Boxuan Li wrote:
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID", edgeID).profile().next()
and paste the output (when you experience the slowness) here?
Best,Boxuan
On Jun 23, 2022, at 2:29 PM, Joe Obernberger via lists.lfaidata.foundation <joseph.obernberger=gmail.com@...> wrote:
Looking for documentation on how to do profile() - do you have any?
Queries like this are also slow:Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
There is an index on edgeID (mgmt.buildIndex("byEdgeIDComposite", Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID string in an outside cache to help with this, but when it's not in the cache, it can take 2 seconds to return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide theprofile()
output here when you see such slowness? It is also worth trying if other queries (e.g. loading a vertex) experience similar slowness.
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 4:51 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Getting Edges - PerformanceIt's usually small; typically around 3 edges.
Reading your article (https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8).Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it return?
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 1:06 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: [janusgraph-users] Getting Edges - PerformanceHi All - I'm seeing a performance issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com