Hi All - I'm seeing a performance issue with this statement in Java code: Iterator<Edge> edgeIt = vert.edges(Direction.IN); in some cases this is taking over 3 seconds to return. What can I do to better this performance? Thank you! -Joe -- This email has been checked for viruses by AVG. https://www.avg.com
|
|
When it takes over 3 seconds to return, how many edges does it return?
toggle quoted message
Show quoted text
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation
<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 1:06 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: [janusgraph-users] Getting Edges - Performance
Hi All - I'm seeing a performance issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
|
|
It's usually small; typically around 3 edges.
Reading your article
(https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8).
Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it
return?
Hi All - I'm seeing a performance
issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return.
What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide the
profile() output here when you see such slowness? It is also worth trying if other queries (e.g. loading a vertex) experience similar slowness.
toggle quoted message
Show quoted text
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation
<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 4:51 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Getting Edges - Performance
It's usually small; typically around 3 edges.
Reading your article (https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8).
Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it return?
Hi All - I'm seeing a performance issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Looking for documentation on how to do profile() - do you have
any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID",
edgeID).next();
There is an index on edgeID (mgmt.buildIndex("byEdgeIDComposite",
Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID string in an
outside cache to help with this, but when it's not in the cache,
it can take 2 seconds to return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
toggle quoted message
Show quoted text
It's very suspicious. It shouldn't take 3 seconds just to load 3
edges. Can you provide the
profile() output here when you see such slowness?
It is also worth trying if other queries (e.g. loading a vertex)
experience similar slowness.
It's usually small; typically around 3 edges.
Reading your article (https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8).
Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li
wrote:
When it takes over 3 seconds to return, how many edges does
it return?
Hi All - I'm seeing a
performance issue with this statement in Java code:
Iterator<Edge> edgeIt =
vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to
return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
|
This email has been checked for viruses by AVG
antivirus software.
www.avg.com
|
|
|
toggle quoted message
Show quoted text
Looking for documentation on how to do profile() - do you have any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
There is an index on edgeID (mgmt.buildIndex("byEdgeIDComposite", Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID string in an outside cache to help with this, but when it's not in the cache, it can take 2 seconds to return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide the profile() output here when you see such slowness? It is
also worth trying if other queries (e.g. loading a vertex) experience similar slowness.
Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it return?
Hi All - I'm seeing a performance issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
 |
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Thank you Boxuan - the code (REST service) that is modifying the
graph is being called continuously when running. A slow example
looks like this:
Metrics: Traversal Metrics
Step
Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[edgeID.eq(DS_106|geotime|hou...
1 1 1393.727 100.00
constructGraphCentricQuery
0.011
GraphCentricQuery
1393.698
\_condition=(edgeID = DS_106|geotime|hour
1km|2022-6-14T4|18SVK 86 22|Date,Actor2Geo_Long_Actor2Geo_Lat)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]
\_index=byEdgeIDComposite
backend-query
1 1393.128
\_query=byEdgeIDComposite:multiKSQ[1]
>TOTAL - - 1393.727
-
Code is:
System.out.println("Metrics: "+traversal.E().has("edgeID",
edgeID).profile().next().toString());
dataSourceToCorrelationEdge = traversal.E().has("edgeID",
edgeID).next();
If the time to run that exceeds 500mSec, I print the message.
-Joe
On 6/23/2022 4:15 PM, Boxuan Li wrote:
toggle quoted message
Show quoted text
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID", edgeID).profile().next()
and paste the output (when you experience the
slowness) here?
Best,
Boxuan
Looking for documentation on how to do profile() - do
you have any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
There is an index on edgeID
(mgmt.buildIndex("byEdgeIDComposite",
Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID string
in an outside cache to help with this, but when it's not
in the cache, it can take 2 seconds to return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very suspicious. It shouldn't take 3 seconds just
to load 3 edges. Can you provide the profile() output here
when you see such slowness? It is also worth trying if
other queries (e.g. loading a vertex) experience
similar slowness.
Outbound edges could be large - 100s to
10000s.
-Joe
On 6/22/2022 4:49 PM,
Boxuan Li wrote:
When it takes over 3 seconds to return, how many
edges does it return?
Hi All - I'm seeing a
performance issue with this statement in
Java code:
Iterator<Edge> edgeIt =
vert.edges(Direction.IN);
in some cases this is taking over 3 seconds
to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by
AVG.
https://www.avg.com
 |
This email has been checked for viruses by
AVG antivirus software.
www.avg.com
|
|
|
One thing of note is the tablehistogram for the graphindex table:
nodetool tablehistograms
graphsource.graphindex
graphsource/graphindex histograms
Percentile Read Latency Write Latency
SSTables Partition Size Cell Count
(micros)
(micros) (bytes)
50% 152.32 35.43
0.00 179 1
75% 545.79 51.01
2.00 179 1
95% 943.13 61.21
4.00 215 1
98% 1131.75 73.46
5.00 215 1
99% 1131.75 88.15
5.00 215 1
Min 42.51 11.87
0.00 36 0
Max 1131.75 88.15
5.00 802187438 25109160
Stats:
nodetool tablestats graphsource.graphindex
Total number of tables: 95
----------------
Keyspace : graphsource
Read Count: 28106793
Read Latency: 2.0613708762860283 ms
Write Count: 54754221
Write Latency: 0.089838214609975 ms
Pending Flushes: 0
Table: graphindex
SSTable count: 25
Old SSTable count: 0
Space used (live): 2858126352
Space used (total): 2858126352
Space used by snapshots (total): 2032016596
Off heap memory used (total): 22737395
SSTable Compression Ratio: 0.4225597463117299
Number of partitions (estimate): 10030080
Memtable cell count: 6363776
Memtable data size: 217020150
Memtable off heap memory used: 0
Memtable switch count: 77
Local read count: 11659961
Local read latency: 0.315 ms
Local write count: 41850997
Local write latency: 0.039 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 3.628GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 40827
Bloom filter false ratio: 0.00000
Bloom filter space used: 12591976
Bloom filter off heap memory used: 12591776
Index summary off heap memory used: 9669843
Compression metadata off heap memory used:
475776
Compacted partition minimum bytes: 36
Compacted partition maximum bytes: 802187438
Compacted partition mean bytes: 423
Average live cells per slice (last five
minutes): 1.0
Maximum live cells per slice (last five
minutes): 1
Average tombstones per slice (last five
minutes): 1.0
Maximum tombstones per slice (last five
minutes): 1
Dropped Mutations: 0
Droppable tombstone ratio: 0.00000
It looks like the index is not well partitioned?
-Joe
On 6/23/2022 4:47 PM, Joe Obernberger
via lists.lfaidata.foundation wrote:
toggle quoted message
Show quoted text
Thank you Boxuan - the code (REST service) that is modifying
the graph is being called continuously when running. A slow
example looks like this:
Metrics: Traversal Metrics
Step
Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[edgeID.eq(DS_106|geotime|hou...
1 1 1393.727 100.00
constructGraphCentricQuery
0.011
GraphCentricQuery
1393.698
\_condition=(edgeID = DS_106|geotime|hour
1km|2022-6-14T4|18SVK 86 22|Date,Actor2Geo_Long_Actor2Geo_Lat)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]
\_index=byEdgeIDComposite
backend-query
1 1393.128
\_query=byEdgeIDComposite:multiKSQ[1]
>TOTAL - -
1393.727 -
Code is:
System.out.println("Metrics: "+traversal.E().has("edgeID",
edgeID).profile().next().toString());
dataSourceToCorrelationEdge = traversal.E().has("edgeID",
edgeID).next();
If the time to run that exceeds 500mSec, I print the message.
-Joe
On 6/23/2022 4:15 PM, Boxuan Li
wrote:
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID",
edgeID).profile().next()
and paste the output (when you experience the
slowness) here?
Best,
Boxuan
Looking for
documentation on how to do profile() - do you have
any?
Queries like
this are also slow:
Edge
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
There is an
index on edgeID (mgmt.buildIndex("byEdgeIDComposite",
Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID
string in an outside cache to help with this, but when
it's not in the cache, it can take 2 seconds to
return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very
suspicious. It shouldn't take 3 seconds just to load
3 edges. Can you provide the profile() output here
when you see such slowness? It is also worth trying
if other queries (e.g. loading a vertex) experience
similar slowness.
Outbound edges could be large - 100s to
10000s.
-Joe
On 6/22/2022 4:49 PM,
Boxuan Li wrote:
When it takes over 3 seconds to
return, how many edges does it return?
Hi All - I'm seeing
a performance issue with this statement in
Java code:
Iterator<Edge> edgeIt =
vert.edges(Direction.IN);
in some cases this is taking over 3
seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by
AVG.
https://www.avg.com
 |
This email has been checked
for viruses by AVG antivirus software.
www.avg.com
|
|
|
Your Cassandra statistics shows pretty low latency, so the problem does not seem to be related to Cassandra. I would check if it’s a network problem. It might also be helpful if you could provide more details, e.g.:
1. How is your cluster set up? Is JanusGraph instance in the same data center as your Cassandra cluster?
2. How is your workload? QPS?
3. How often do you see such slowness? What is p50, p90, p99 latency of such kind of queries?
Best,
toggle quoted message
Show quoted text
One thing of note is the tablehistogram for the graphindex table:
nodetool tablehistograms graphsource.graphindex
graphsource/graphindex histograms
Percentile Read Latency Write Latency SSTables Partition Size Cell Count
(micros) (micros) (bytes)
50% 152.32 35.43 0.00 179 1
75% 545.79 51.01 2.00 179 1
95% 943.13 61.21 4.00 215 1
98% 1131.75 73.46 5.00 215 1
99% 1131.75 88.15 5.00 215 1
Min 42.51 11.87 0.00 36 0
Max 1131.75 88.15 5.00 802187438 25109160
Stats:
nodetool tablestats graphsource.graphindex
Total number of tables: 95
----------------
Keyspace : graphsource
Read Count: 28106793
Read Latency: 2.0613708762860283 ms
Write Count: 54754221
Write Latency: 0.089838214609975 ms
Pending Flushes: 0
Table: graphindex
SSTable count: 25
Old SSTable count: 0
Space used (live): 2858126352
Space used (total): 2858126352
Space used by snapshots (total): 2032016596
Off heap memory used (total): 22737395
SSTable Compression Ratio: 0.4225597463117299
Number of partitions (estimate): 10030080
Memtable cell count: 6363776
Memtable data size: 217020150
Memtable off heap memory used: 0
Memtable switch count: 77
Local read count: 11659961
Local read latency: 0.315 ms
Local write count: 41850997
Local write latency: 0.039 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 3.628GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 40827
Bloom filter false ratio: 0.00000
Bloom filter space used: 12591976
Bloom filter off heap memory used: 12591776
Index summary off heap memory used: 9669843
Compression metadata off heap memory used: 475776
Compacted partition minimum bytes: 36
Compacted partition maximum bytes: 802187438
Compacted partition mean bytes: 423
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
Droppable tombstone ratio: 0.00000
It looks like the index is not well partitioned?
-Joe
On 6/23/2022 4:47 PM, Joe Obernberger via lists.lfaidata.foundation wrote:
Thank you Boxuan - the code (REST service) that is modifying the graph is being called continuously when running. A slow example looks like this:
Metrics: Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[edgeID.eq(DS_106|geotime|hou... 1 1 1393.727 100.00
constructGraphCentricQuery 0.011
GraphCentricQuery 1393.698
\_condition=(edgeID = DS_106|geotime|hour 1km|2022-6-14T4|18SVK 86 22|Date,Actor2Geo_Long_Actor2Geo_Lat)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]
\_index=byEdgeIDComposite
backend-query 1 1393.128
\_query=byEdgeIDComposite:multiKSQ[1]
>TOTAL - - 1393.727 -
Code is:
System.out.println("Metrics: "+traversal.E().has("edgeID", edgeID).profile().next().toString());
dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
If the time to run that exceeds 500mSec, I print the message.
-Joe
On 6/23/2022 4:15 PM, Boxuan Li wrote:
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID", edgeID).profile().next()
and paste the output (when you experience the slowness) here?
Best,
Boxuan
Looking for documentation on how to do profile() - do you have any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
There is an index on edgeID (mgmt.buildIndex("byEdgeIDComposite", Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID string in an outside cache to help with this, but when it's not in the cache, it can take 2 seconds to return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide the profile() output here when you see such slowness? It is
also worth trying if other queries (e.g. loading a vertex) experience similar slowness.
Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it return?
Hi All - I'm seeing a performance issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
 |
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Hi Boxuan -
Cluster is a 15 node cassandra cluster:
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host
ID Rack
UN 172.16.100.45 961.03 GiB 250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98 rack1
UN 172.16.100.251 946.8 GiB 200 ?
274a6e8d-de37-4e0b-b000-02d221d858a5 rack1
UN 172.16.100.35 688.11 GiB 200 ?
59150c47-274a-46fb-9d5e-bed468d36797 rack1
UN 172.16.100.252 909.26 GiB 200 ?
e83aa851-69b4-478f-88f6-60e657ea6539 rack1
UN 172.16.100.249 1.01 TiB 200 ?
49e4f571-7d1c-4e1e-aca7-5bbe076596f7 rack1
UN 172.16.100.38 809.11 GiB 200 ?
0d9509cc-2f23-4117-a883-469a1be54baf rack1
UN 172.16.100.36 909.15 GiB 200 ?
d9702f96-256e-45ae-8e12-69a42712be50 rack1
UN 172.16.100.39 990.91 GiB 200 ?
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47 rack1
UN 172.16.100.248 867.42 GiB 200 ?
4bbbe57c-6219-41e5-bbac-de92a9594d53 rack1
UN 172.16.100.44 891.53 GiB 200 ?
b2e5366e-8386-40ec-a641-27944a5a7cfa rack1
UN 172.16.100.37 493.34 GiB 120 ?
08a19658-40be-4e55-8709-812b3d4ac750 rack1
UN 172.16.100.250 972.12 GiB 200 ?
b74b6e65-af63-486a-b07f-9e304ec30a39 rack1
UN 172.16.20.16 1.84 TiB 500 ?
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297 rack1
UN 172.16.100.34 782.35 GiB 200 ?
352fd049-32f8-4be8-9275-68b145ac2832 rack1
UN 172.16.100.42 1.81 TiB 500 ?
b088a8e6-42f3-4331-a583-47ef5149598f rack1
Janusgraph instance? The java code talks directly to cassandra
via the Janusgraph library. Should I be using a Gremlin server
instead?
Right now, I'm trying to load data into the graph as quickly as
possible. The service that does that needs to query the graph
many times in order to put in the correct nodes/edges. Slowness
happens frequently:
Took: 965mSec -
[DS_106_2022-06-18T12-15-58.184Z_4iaiTwYa] - num handleCID
calls: 234 edgeHits: 819 vertHits: 231 - eCacheSize: 3
vertCacheSize: 3 vertMiss: 4 edgeMiss: 6
Took: 2072mSec - [DS_106_2022-06-18T10-20-54.955Z_VHf1MKA2] -
num handleCID calls: 701 edgeHits: 2467 vertHits: 697 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 3 edgeMiss: 2
Took: 2203mSec - [DS_106_2022-06-18T09-44-20.106Z_jUcshXdg] -
num handleCID calls: 511 edgeHits: 1735 vertHits: 507 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 5 edgeMiss: 3
Took: 1925mSec - [DS_106_2022-06-18T12-15-58.205Z_DFnPCE0w] -
num handleCID calls: 157 edgeHits: 499 vertHits: 155 -
eCacheSize: 2 vertCacheSize: 2 vertMiss: 2 edgeMiss: 1
Took: 94mSec - [DS_91_2022-06-18T12-15-58.724Z_P6nyGwbl] - num
handleCID calls: 124 edgeHits: 407 vertHits: 124 - eCacheSize: 3
vertCacheSize: 3 vertMiss: 2 edgeMiss: 3
Took: 846mSec - [DS_106_2022-06-18T10-20-54.985Z_XO7LVfDU] - num
handleCID calls: 52 edgeHits: 170 vertHits: 50 - eCacheSize: 2
vertCacheSize: 2 vertMiss: 2 edgeMiss: 3
Took: 675mSec - [DS_106_2022-06-18T09-44-19.922Z_d2qo7ioj] - num
handleCID calls: 650 edgeHits: 2612 vertHits: 650 - eCacheSize:
3 vertCacheSize: 3 vertMiss: 2 edgeMiss: 5
Took: 705mSec - [DS_106_2022-06-18T12-15-58.724Z_U5seNz1V] - num
handleCID calls: 273 edgeHits: 926 vertHits: 272 - eCacheSize: 3
vertCacheSize: 3 vertMiss: 3 edgeMiss: 1
Took: 687mSec - [DS_106_2022-06-18T10-21-01.336Z_7kToNeIj] - num
handleCID calls: 29 edgeHits: 44 vertHits: 24 - eCacheSize: 2
vertCacheSize: 2 vertMiss: 5 edgeMiss: 2
Took: 4024mSec - [DS_106_2022-06-18T09-44-19.846Z_Dc3sFChv] -
num handleCID calls: 777 edgeHits: 3233 vertHits: 773 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 5 edgeMiss: 11
Took: 4641mSec - [DS_106_2022-06-18T10-21-03.304Z_bup3D681] -
num handleCID calls: 307 edgeHits: 1272 vertHits: 300 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 8 edgeMiss: 15
Took: 4665mSec - [DS_106_2022-06-18T12-16-02.292Z_XwMKRm0i] -
num handleCID calls: 310 edgeHits: 1290 vertHits: 303 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 8 edgeMiss: 15
Took: 1939mSec - [DS_106_2022-06-18T09-44-21.925Z_uGFYTu1V] -
num handleCID calls: 383 edgeHits: 1350 vertHits: 379 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 4 edgeMiss: 5
Took: 3032mSec - [DS_106_2022-06-18T10-21-14.617Z_TbwWwuSG] -
num handleCID calls: 826 edgeHits: 2900 vertHits: 823 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 5 edgeMiss: 5
Took: 3217mSec - [DS_106_2022-06-18T12-16-02.275Z_xBhlZITR] -
num handleCID calls: 183 edgeHits: 580 vertHits: 179 -
eCacheSize: 3 vertCacheSize: 2 vertMiss: 4 edgeMiss: 2
Took: 2606mSec - [DS_106_2022-06-18T09-44-21.877Z_YClmiFCl] -
num handleCID calls: 335 edgeHits: 1156 vertHits: 332 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 4 edgeMiss: 3
Took: 2276mSec - [DS_106_2022-06-18T10-21-14.630Z_bv48tkF8] -
num handleCID calls: 507 edgeHits: 1782 vertHits: 504 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 4 edgeMiss: 3
Took: 1908mSec - [DS_106_2022-06-18T09-44-20.254Z_XhGH0KI4] -
num handleCID calls: 583 edgeHits: 2153 vertHits: 581 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 3 edgeMiss: 6
Took: 2463mSec - [DS_106_2022-06-18T12-16-07.498Z_acwPqSnu] -
num handleCID calls: 48 edgeHits: 162 vertHits: 48 - eCacheSize:
2 vertCacheSize: 2 vertMiss: 1 edgeMiss: 6
Took: 386mSec - [DS_106_2022-06-18T10-21-15.651Z_KWOek5Nm] - num
handleCID calls: 11 edgeHits: 19 vertHits: 8 - eCacheSize: 2
vertCacheSize: 2 vertMiss: 4 edgeMiss: 5
Took: 78mSec - [DS_91_2022-06-18T12-16-10.616Z_jQPoltfT] - num
handleCID calls: 56 edgeHits: 268 vertHits: 54 - eCacheSize: 3
vertCacheSize: 2 vertMiss: 2 edgeMiss: 2
Took: 64mSec - [DS_91_2022-06-18T12-16-18.846Z_vzIEpQFH] - num
handleCID calls: 96 edgeHits: 278 vertHits: 91 - eCacheSize: 3
vertCacheSize: 3 vertMiss: 6 edgeMiss: 3
Took: 1425mSec - [DS_106_2022-06-18T09-44-21.953Z_aOwVhMAS] -
num handleCID calls: 979 edgeHits: 4172 vertHits: 977 -
eCacheSize: 3 vertCacheSize: 3 vertMiss: 4 edgeMiss: 7
The num handleCID calls queries janusgraph [cassandra] several
times per call, but I'm using internal hashmaps of Vertex and Edge
to help speed that up.
Could you explain p50, 90, p99 latency?
-Joe
On 6/23/2022 7:33 PM, Boxuan Li wrote:
toggle quoted message
Show quoted text
Your Cassandra statistics shows pretty low latency, so the problem
does not seem to be related to Cassandra. I would check if it’s a
network problem. It might also be helpful if you could provide
more details, e.g.:
1. How is your cluster set up? Is JanusGraph
instance in the same data center as your Cassandra cluster?
2. How is your workload? QPS?
3. How often do you see such slowness? What is p50,
p90, p99 latency of such kind of queries?
Best,
Boxuan
One thing of note is the tablehistogram
for the graphindex table:
nodetool
tablehistograms graphsource.graphindex
graphsource/graphindex histograms
Percentile Read Latency Write
Latency SSTables Partition
Size Cell Count
(micros)
(micros) (bytes)
50% 152.32
35.43 0.00
179 1
75% 545.79
51.01 2.00
179 1
95% 943.13
61.21 4.00
215 1
98% 1131.75
73.46 5.00
215 1
99% 1131.75
88.15 5.00
215 1
Min 42.51
11.87 0.00
36 0
Max 1131.75
88.15 5.00
802187438 25109160
Stats:
nodetool
tablestats graphsource.graphindex
Total number of tables: 95
----------------
Keyspace : graphsource
Read Count: 28106793
Read Latency: 2.0613708762860283 ms
Write Count: 54754221
Write Latency: 0.089838214609975 ms
Pending Flushes: 0
Table: graphindex
SSTable count: 25
Old SSTable count: 0
Space used (live): 2858126352
Space used (total): 2858126352
Space used by snapshots (total):
2032016596
Off heap memory used (total):
22737395
SSTable Compression Ratio:
0.4225597463117299
Number of partitions (estimate):
10030080
Memtable cell count: 6363776
Memtable data size: 217020150
Memtable off heap memory used: 0
Memtable switch count: 77
Local read count: 11659961
Local read latency: 0.315 ms
Local write count: 41850997
Local write latency: 0.039 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 3.628GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives:
40827
Bloom filter false ratio:
0.00000
Bloom filter space used:
12591976
Bloom filter off heap memory
used: 12591776
Index summary off heap memory
used: 9669843
Compression metadata off heap
memory used: 475776
Compacted partition minimum
bytes: 36
Compacted partition maximum
bytes: 802187438
Compacted partition mean bytes:
423
Average live cells per slice
(last five minutes): 1.0
Maximum live cells per slice
(last five minutes): 1
Average tombstones per slice
(last five minutes): 1.0
Maximum tombstones per slice
(last five minutes): 1
Dropped Mutations: 0
Droppable tombstone ratio:
0.00000
It looks like the index is not well
partitioned?
-Joe
On 6/23/2022 4:47 PM,
Joe Obernberger via lists.lfaidata.foundation
wrote:
Thank you Boxuan - the code (REST
service) that is modifying the graph is being
called continuously when running. A slow
example looks like this:
Metrics: Traversal Metrics
Step
Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[edgeID.eq(DS_106|geotime|hou...
1 1 1393.727 100.00
constructGraphCentricQuery
0.011
GraphCentricQuery
1393.698
\_condition=(edgeID = DS_106|geotime|hour
1km|2022-6-14T4|18SVK 86
22|Date,Actor2Geo_Long_Actor2Geo_Lat)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]
\_index=byEdgeIDComposite
backend-query
1 1393.128
\_query=byEdgeIDComposite:multiKSQ[1]
>TOTAL -
- 1393.727 -
Code is:
System.out.println("Metrics:
"+traversal.E().has("edgeID",
edgeID).profile().next().toString());
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
If the time to run that exceeds
500mSec, I print the message.
-Joe
On 6/23/2022 4:15 PM,
Boxuan Li wrote:
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID",
edgeID).profile().next()
and paste the output (when you
experience the slowness) here?
Best,
Boxuan
Looking for documentation on how to do
profile() - do you have any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge =
traversal.E().has("edgeID",
edgeID).next();
There is an index on edgeID
(mgmt.buildIndex("byEdgeIDComposite",
Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual
JanusGraph edgeID string in an outside
cache to help with this, but when it's
not in the cache, it can take 2
seconds to return.
-Joe
On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very suspicious. It shouldn't
take 3 seconds just to load 3 edges.
Can you provide the profile() output
here when you see such slowness? It
is also worth trying if other
queries (e.g. loading a vertex)
experience similar slowness.
Outbound
edges could be large - 100s to
10000s.
-Joe
On
6/22/2022 4:49 PM, Boxuan Li
wrote:
When it takes over 3 seconds to
return, how many edges does it
return?
Hi
All - I'm seeing a
performance issue with
this statement in Java
code:
Iterator<Edge>
edgeIt =
vert.edges(Direction.IN);
in some cases this is
taking over 3 seconds to
return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been
checked for viruses by
AVG.
https://www.avg.com
 |
This email has been
checked for viruses by
AVG antivirus software.
www.avg.com
|
|
|
Hi Joe, > The java code talks directly to cassandra via the Janusgraph library. Should I be using a Gremlin server instead? I see, so you are using embedded JanusGraph. It has less overhead compared to the Gremlin server which is an advantage. That being said, if your java programs are not running in the same region as your Cassandra cluster, the latency could be large. In that case, it is better to use a gremlin server that sits in the same data center as Cassandra. > Slowness happens frequently. Could you explain p50, 90, p99 latency? This is a quick intro: https://readosapien.com/what-is-p50-p90-p99-latency/. In short, I would like to quantitatively learn what you mean by "slowness happens frequently". > I'm trying to load data into the graph as quickly as possible I am assuming your throughput is large. You might be hitting the limit of a JanusGraph internal thread pool. Try disabling ` storage.cql.executor-service.enabled`. See https://docs.janusgraph.org/configs/configuration-reference/#storagecqlexecutor-serviceBest, Boxuan
|
|