Re: Getting Edges - Performance


Boxuan Li
 

Your Cassandra statistics shows pretty low latency, so the problem does not seem to be related to Cassandra. I would check if it’s a network problem. It might also be helpful if you could provide more details, e.g.:

1. How is your cluster set up? Is JanusGraph instance in the same data center as your Cassandra cluster?
2. How is your workload? QPS?
3. How often do you see such slowness? What is p50, p90, p99 latency of such kind of queries?

Best,
Boxuan

On Jun 23, 2022, at 5:03 PM, Joe Obernberger via lists.lfaidata.foundation <joseph.obernberger=gmail.com@...> wrote:

One thing of note is the tablehistogram for the graphindex table:

nodetool tablehistograms graphsource.graphindex
graphsource/graphindex histograms
Percentile      Read Latency     Write Latency          SSTables    Partition Size        Cell Count
                    (micros)          (micros)                             (bytes)
50%                   152.32             35.43              0.00               179                 1
75%                   545.79             51.01              2.00               179                 1
95%                   943.13             61.21              4.00               215                 1
98%                  1131.75             73.46              5.00               215                 1
99%                  1131.75             88.15              5.00               215                 1
Min                    42.51             11.87              0.00                36                 0
Max                  1131.75             88.15              5.00         802187438          25109160

Stats:

nodetool tablestats graphsource.graphindex
Total number of tables: 95
----------------
Keyspace : graphsource
        Read Count: 28106793
        Read Latency: 2.0613708762860283 ms
        Write Count: 54754221
        Write Latency: 0.089838214609975 ms
        Pending Flushes: 0
                Table: graphindex
                SSTable count: 25
                Old SSTable count: 0
                Space used (live): 2858126352
                Space used (total): 2858126352
                Space used by snapshots (total): 2032016596
                Off heap memory used (total): 22737395
                SSTable Compression Ratio: 0.4225597463117299
                Number of partitions (estimate): 10030080
                Memtable cell count: 6363776
                Memtable data size: 217020150
                Memtable off heap memory used: 0
                Memtable switch count: 77
                Local read count: 11659961
                Local read latency: 0.315 ms
                Local write count: 41850997
                Local write latency: 0.039 ms
                Pending flushes: 0
                Percent repaired: 0.0
                Bytes repaired: 0.000KiB
                Bytes unrepaired: 3.628GiB
                Bytes pending repair: 0.000KiB
                Bloom filter false positives: 40827
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 12591976
                Bloom filter off heap memory used: 12591776
                Index summary off heap memory used: 9669843
                Compression metadata off heap memory used: 475776
                Compacted partition minimum bytes: 36
                Compacted partition maximum bytes: 802187438
                Compacted partition mean bytes: 423
                Average live cells per slice (last five minutes): 1.0
                Maximum live cells per slice (last five minutes): 1
                Average tombstones per slice (last five minutes): 1.0
                Maximum tombstones per slice (last five minutes): 1
                Dropped Mutations: 0
                Droppable tombstone ratio: 0.00000

It looks like the index is not well partitioned?

-Joe

On 6/23/2022 4:47 PM, Joe Obernberger via lists.lfaidata.foundation wrote:

Thank you Boxuan - the code (REST service) that is modifying the graph is being called continuously when running.  A slow example looks like this:

Metrics: Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[edgeID.eq(DS_106|geotime|hou...                     1           1        1393.727   100.00
  constructGraphCentricQuery                                                                   0.011
  GraphCentricQuery                                                                         1393.698
    \_condition=(edgeID = DS_106|geotime|hour 1km|2022-6-14T4|18SVK 86 22|Date,Actor2Geo_Long_Actor2Geo_Lat)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]
    \_index=byEdgeIDComposite
    backend-query                                                      1                    1393.128
    \_query=byEdgeIDComposite:multiKSQ[1]
                                            >TOTAL                     -           -        1393.727        -

Code is:
System.out.println("Metrics: "+traversal.E().has("edgeID", edgeID).profile().next().toString());
dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();

If the time to run that exceeds 500mSec, I print the message.

-Joe

On 6/23/2022 4:15 PM, Boxuan Li wrote:
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step 

Can you do

traversal.E().has("edgeID", edgeID).profile().next()

and paste the output (when you experience the slowness) here?

Best,
Boxuan

On Jun 23, 2022, at 2:29 PM, Joe Obernberger via lists.lfaidata.foundation <joseph.obernberger=gmail.com@...> wrote:

Looking for documentation on how to do profile() - do you have any?

Queries like this are also slow:
Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();

There is an index on edgeID (mgmt.buildIndex("byEdgeIDComposite", Edge.class).addKey(edgeID).unique().buildCompositeIndex();)
I'm trying to cache the actual JanusGraph edgeID string in an outside cache to help with this, but when it's not in the cache, it can take 2 seconds to return.

-Joe

On 6/22/2022 4:53 PM, Boxuan Li wrote:
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide the profile() output here when you see such slowness? It is also worth trying if other queries (e.g. loading a vertex) experience similar slowness.

From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 4:51 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Getting Edges - Performance
 
It's usually small; typically around 3 edges.
Reading your article (https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8).
Outbound edges could be large - 100s to 10000s.

-Joe

On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it return?

From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 1:06 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: [janusgraph-users] Getting Edges - Performance
 
Hi All - I'm seeing a performance issue with this statement in Java code:

Iterator<Edge> edgeIt = vert.edges(Direction.IN);

in some cases this is taking over 3 seconds to return.  What can I do to 
better this performance?
Thank you!

-Joe


-- 
This email has been checked for viruses by AVG.
https://www.avg.com









AVG logo
This email has been checked for viruses by AVG antivirus software. 
www.avg.com



Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.