Date   

Re: CQL scaling limit?

madams@...
 

Hi Boxuan,

I can definitively try from the 0.6.0 pre release or master version, that's a good idea,
I'll come back with the result,

Thanks!
Cheers,
Marc


Re: CQL scaling limit?

Boxuan Li
 
Edited

Hi,

I didn't check your metrics but my first impression was this might be related to the internal thread pool. Can you try out the 0.6.0 pre-release version or master version? Remember to set `storage.cql.executor-service.enabled` to false. Before 0.6.0, an internal thread pool was used to process all CQL queries, which had a hard-coded 15 threads. https://github.com/JanusGraph/janusgraph/pull/2700 made the thread pool configurable, and https://github.com/JanusGraph/janusgraph/pull/2741 further made this thread pool optional.

EDIT: Just realized your problem was related to the horizontal scaling of JanusGraph instances. Then this internal thread pool thing is likely not related - but still worth trying.

Hope this helps.

Best,
Boxuan


CQL scaling limit?

madams@...
 

Hi all,

We've been trying to scale horizontally Janusgraph to keep up with the data throughput, but we're reaching some limit in scaling.
We've been trying different things to pinpoint the bottleneck but we're struggling to find it... Some help would be most welcome :)

Our current setup:

  • 6 ScyllaDB Instances
  • 6 Elasticsearch Instances
  • Our "indexers" running Janusgraph as a lib, they can be scaled up and down
    • They read data from our sources and write it to Janusgraph
    • Each Indexer runs on his own jvm
Our scaling test looks like this:



Each indexer counts the number of records it processes. We start with one indexer, and every 5 ~ 10 minutes we double the number of indexers and look how the overall performance increased.
From top down, left to right these panels represent:

  • Total Processed Records: The overall performance of the system
  • Average Processed Records: Average per Indexer, ideally this should be a flat curve
  • Number of Running Indexers: We scaled at 1, 2, 4, 8, 16, 32, 64
  • Processed Records Per Indexer
  • Cpu Usage: The cpu usage per Indexer
  • Heap: Heap usage per indexer. The red line is the max memory the Heap can take, we left a generous margin


As you can see past 4 Indexers, the performance per Indexer decreases until no additional throughput can be reached. At first we thought this might simply be due to resource limitations, but ScyllaDB and Elasticsearch are not really struggling. The ScyllaDB load, read and write latency looked good during this test:



Both ScyllaDB and Elasticsearch are running on NVMe hard disks, with 10GB+ of ram. ScyllaDB is also deployed with CPU pinning. If we try a Cassandra Stress test, we can really max out ScyllaDB.

Our janusgraph configuration looks like:

storage.backend=cql
storage.hostname=scylla
storage.cql.replication-factor=3
index.search.backend=elasticsearch
index.search.hostname=elasticsearch
index.search.elasticsearch.transport-scheme=http
index.search.elasticsearch.create.ext.number_of_shards=4
index.search.elasticsearch.create.ext.number_of_replicas=2
graph.replace-instance-if-exists=true
schema.default=none
tx.log-tx=true
tx.max-commit-time=170000
ids.block-size=100000
ids.authority.wait-time=1000


Each input record contains ~20 vertices and ~20 edges. The workflow of the indexer is:

  1. For each vertex, check if it exists in the graph using a composite index. Create it if it does not.
  2. Insert edges using the vertex id's returned by step 1

Each transaction inserts ~ 10 records. Each indexer runs 10 transactions in parallel.

We tried different things but without success:

  • Increasing/decreasing the transaction size
  • Increasing/decreasing storage.cql.batch-statement-size
  • Enabling/disabling batch loading
  • Increasing the ids.block-size to 10 000 000

The most successful test so far was to switch the cql write consistency level to ANY and the read consistency level to ONE:



This time the indexer scaled nicely up to 16 indexers, and the overall performance was still increased by scaling to 32 indexers. Once we reached 64 indexers though the performance dramatically dropped. There ScyllaDB had a little more load, but it still doesn't look like it's struggling:

We don't use LOCK consistency. We never update vertices or edges, so FORK consistency doesn't look useful in our case.

It really looks like something somewhere is a bottleneck when we start scaling.

I checked out the Janusgraph github repo locally and went through it to try to understand what set of operations Janusgraph does to insert vertices/edges and how this binds to transactions, but I'm struggling a little to find that information.

So, any idea/recommendations?

Cheers,
Marc


Re: graphml properties of properties

Laura Morales <lauretas@...>
 

FWIW I've tried exporting the graph in the example to JSON (GraphSON) and the metaproperty *is* preserved, however when I import the same graph from the json file the metaproperty is not created.

Sent: Thursday, August 26, 2021 at 6:36 AM
From: "Laura Morales" <lauretas@...>
To: janusgraph-users@...
Cc: janusgraph-users@...
Subject: Re: [janusgraph-users] graphml properties of properties

Thank you for this example.
After running this, I can see that the property "metatest" has been ignored and is missing completely from the GraphML output. Another issue that I have with GraphML is that it cannot apparently represent all the key types that are supported by Janus. For example it does not define any attribute for "date" and "time", and it does not allow to specify "int32" or "int64"; it only defines basic primitives such as string, int, double.

What serialization format should I use to best match Janus? One that allows metaproperties and also all the various types (date, int32, char, etc.). I also need it to be human readable because I'm editing my graph file manually, and then I load this file into Janus. GraphML is not that bad, I can use it... it's just too limited given that it does not support the features mentioned above. Is there any better alternative? Or should I roll my own?



Sent: Wednesday, August 25, 2021 at 5:05 PM
From: hadoopmarc@...
To: janusgraph-users@...
Subject: Re: [janusgraph-users] graphml properties of properties
Hi Laura,

No. As the TinkerPop docs say: "graphML is a lossy format".

You can try for yourself with:gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).properties('name').elementMap()
==>[id:0,key:name,value:marko]
gremlin> g.V(1).properties('name').property('metatest', 'hi')
==>vp[name->marko]
gremlin> g.V(1).properties('name').elementMap()
==>[id:0,key:name,value:marko,metatest:hi]
gremlin> g.addV('person').property('name', 'turing')
==>v[13]
gremlin> g.io('data/metatest.xml').write().iterate()
gremlin>Best wishes,    Marc


Re: graphml properties of properties

Laura Morales <lauretas@...>
 

Thank you for this example.
After running this, I can see that the property "metatest" has been ignored and is missing completely from the GraphML output. Another issue that I have with GraphML is that it cannot apparently represent all the key types that are supported by Janus. For example it does not define any attribute for "date" and "time", and it does not allow to specify "int32" or "int64"; it only defines basic primitives such as string, int, double.

What serialization format should I use to best match Janus? One that allows metaproperties and also all the various types (date, int32, char, etc.). I also need it to be human readable because I'm editing my graph file manually, and then I load this file into Janus. GraphML is not that bad, I can use it... it's just too limited given that it does not support the features mentioned above. Is there any better alternative? Or should I roll my own?



Sent: Wednesday, August 25, 2021 at 5:05 PM
From: hadoopmarc@...
To: janusgraph-users@...
Subject: Re: [janusgraph-users] graphml properties of properties
Hi Laura,

No. As the TinkerPop docs say: "graphML is a lossy format".

You can try for yourself with:gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).properties('name').elementMap()
==>[id:0,key:name,value:marko]
gremlin> g.V(1).properties('name').property('metatest', 'hi')
==>vp[name->marko]
gremlin> g.V(1).properties('name').elementMap()
==>[id:0,key:name,value:marko,metatest:hi]
gremlin> g.addV('person').property('name', 'turing')
==>v[13]
gremlin> g.io('data/metatest.xml').write().iterate()
gremlin>Best wishes,    Marc


Re: graphml properties of properties

hadoopmarc@...
 

Hi Laura,

No. As the TinkerPop docs say: "graphML is a lossy format".

You can try for yourself with:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).properties('name').elementMap()
==>[id:0,key:name,value:marko]
gremlin> g.V(1).properties('name').property('metatest', 'hi')
==>vp[name->marko]
gremlin> g.V(1).properties('name').elementMap()
==>[id:0,key:name,value:marko,metatest:hi]
gremlin> g.addV('person').property('name', 'turing')
==>v[13]
gremlin> g.io('data/metatest.xml').write().iterate()
gremlin>
Best wishes,    Marc


Too low Performance when running PageRank and WCC on Graph500

shepherdkingqsp@...
 

Hi there,

Recently I am trying to measure Janusgraph performance. When I was trying to run benchmark using Janusgraph 0.5.3, I found too low performance when testing pagerank and wcc.

The code I used that you can refer to:
https://github.com/gaolk/graph-database-benchmark/tree/master/benchmark/janusgraph

Data:
Graph500

The environment:
Janusgraph Version: 0.5.3 (download the full release zip from janugraph github)

The config of Janusgraph (default conf/janusgraph-cql.properties)
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.batch-loading=true
storage.hostname=127.0.0.1
storage.cql.keyspace=janusgraph
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
To be more specific, I ran Khop with it and got reasonable result.

K-Hop

Latency

1-Hop

23.42

2-Hop

16628.49

3-Hop

1872747.62(2/10 2h Timeout)

4-Hop

889146.03(8/10 2h Timeout)

5-Hop

10/10 2h Timeout

6-Hop

10/10 2h Timeout


But when I ran wcc and pagerank, I got 3 hours timeout either.

Could you somebody help find the reason that I got low performance?


Regards,
Shipeng


Re: Fail to load complete edge data of Graph500 to Janusgraph 0.5.3 with Cassandra CQl as storage backends

shepherdkingqsp@...
 

HI Marc,

I have tried it. And finally I got complete Graph500 vertices and edges loaded.

But there is a still weird thing that I found the same exception reported in the log.

Could you please explain this? With exception reported, the data was still loaded completely?

Regards,
Shipeng


Re: Not able to enable Write-ahead logs using tx.log-tx for existing JanusGraph setup

Radhika Kundam
 

Hi Boxuan,

Thank you for the response. I tried by forceClosing all management instances except the current one before setting "tx.log-tx". Management is getting updated with latest value, this is not the issue even without closing other instances. The issue is graph.getConfiguration().hasLogTransactions() is not getting refreshed through JanusGraphManagement property update. My understanding is logTransactions will be updated only through GraphDatabaseConfiguration:preLoadConfiguration which will be called only when we open graph instance JanusGraphFactory.open(config).I don't see any setter method to update logTransactions when we update JanusGraphManagement property. Because of this after updating JanusGraphManagement when we restart cluster it's invoking  GraphDatabaseConfiguration:preLoadConfiguration which is updating logTransactions value then.

Please correct me if I am missing anything and it'll be really helpful if any alternative approach to update logTransaction when we update Management property without restarting the cluster.

Thanks,
Radhika


Re: org.janusgraph.diskstorage.PermanentBackendException: Read 1 locks with our rid but mismatched timestamps

Ronnie
 

Hi Mark,
Thanks for the suggestions.

Narrowed down the issue to JanusGraph's support for Azul Prime JDK 8, which generates nanosecond precision for Instant.now() API, as compared to the millisecond precision for OpenJDK 8. So the issue was resolved by applying the patch for https://github.com/JanusGraph/janusgraph/issues/1979.

Thanks!
Ronnie


Re: Fail to load complete edge data of Graph500 to Janusgraph 0.5.3 with Cassandra CQl as storage backends

shepherdkingqsp@...
 

On Tue, Aug 24, 2021 at 06:20 AM, <hadoopmarc@...> wrote:
with
Got it. I will try it soon. 

Thanks, Marc!

Shipeng


Re: Fail to load complete edge data of Graph500 to Janusgraph 0.5.3 with Cassandra CQl as storage backends

hadoopmarc@...
 

Hi Shipeng Qi,

The system that you use might be too small for the number of threads in the loading code. You can try to decrease the number of threads from 8 to 4 with:

private static ExecutorService pool = Executors.newFixedThreadPool(4);

Best wishes,    Marc


Re: Not able to enable Write-ahead logs using tx.log-tx for existing JanusGraph setup

Boxuan Li
 

I suspect this is due to stale management instances. Check out https://developer.ibm.com/articles/janusgraph-tips-and-tricks-pt-2/#troubleshooting-indexes and see if it helps.


Fail to load complete edge data of Graph500 to Janusgraph 0.5.3 with Cassandra CQl as storage backends

shepherdkingqsp@...
 

Hi there,

I am new to Janusgraph. I have some problems in loading data to Janusgraph with Cassandra CQL as storage backend.

When I tried to load Graph500 to Janusgraph, planning to run benchmark on it.  I found that the edges loaded to janusgraph were not complete, 67107183 edges loaded while 67108864 supposed. (Vertices loaded were complete)

The code and config I used is post as below.

The code I used is a benchmark by tigergraph:
- load vertex: https://github.com/gaolk/graph-database-benchmark/blob/master/benchmark/janusgraph/multiThreadVertexImporter.java
- load edge: https://github.com/gaolk/graph-database-benchmark/blob/master/benchmark/janusgraph/multiThreadEdgeImporter.java

The config I used is conf/janusgraph-cql.properties in Janusgraph 0.5.3 full (https://github.com/JanusGraph/janusgraph/releases/download/v0.5.3/janusgraph-full-0.5.3.zip)
cache.db-cache-clean-wait = 20
cache.db-cache-size = 0.5
cache.db-cache-time = 180000
cache.db-cache = true
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.batch-loading=true
storage.cql.keyspace=janusgraph 
storage.hostname=127.0.0.1
I got those exceptions when loading data.
Exception 1:
Caused by: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [/127.0.0.1:9042] Timed out waiting for server response))
        at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at io.vavr.control.Try.of(Try.java:62)
        at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Exception 2:
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT10S
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:100)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        at org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:469)
        at org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
        at org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:52)
        at org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:515)
        at org.janusgraph.graphdb.util.SubqueryIterator.<init>(SubqueryIterator.java:66)
        ... 20 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at io.vavr.API$Match$Case0.apply(API.java:3174)
        at io.vavr.API$Match.of(API.java:3137)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:123)
        at io.vavr.control.Try.getOrElseThrow(Try.java:671)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:290)
        at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:76)
        at org.janusgraph.diskstorage.keycolumnvalue.cache.ExpirationKCVSCache.lambda$getSlice$1(ExpirationKCVSCache.java:91)
        at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)

I have found solution on google but got few things help. Could somebody help?


Best Regards,
Shipeng Qi


Re: Not able to enable Write-ahead logs using tx.log-tx for existing JanusGraph setup

Radhika Kundam
 

Hi Boxuan,For existing JanusGraph setup, I am updating tx.log-tx configuration by setting management system property as mentioned in https://docs.janusgraph.org/basics/configuration/#global-configuration
And I could see the configuration updated properly in JanusGraphManagement.

managementSystem.get("tx.log-tx"); => prints false
managementSystem.set("tx.log-tx", true);
managementSystem.commit();
managementSystem.get("tx.log-tx"); => prints true

But this change is not reflected for logTransactions in 
GraphDatabaseConfiguration:preLoadConfiguration. 
graph.getConfiguration().hasLogTransactions() => prints false
While transaction recovery using StandardTransactionLogProcessor it checks for the argument graph.getConfiguration().hasLogTransactions() which is not having latest config('tx.log-tx') updated in ManagementSystem.
To reflect the change, I had to restart cluster two times. 
Also since it's GLOBAL property I am not allowed to override using graph.configuration() and only available option is to update through ManagementSystem which is not updating logTransactions.

I would really appreciate your help on this.

Thanks,
Radhika


Re: Wait the mixed index backend

toom@...
 

Thank you, this is exactly what I look for.


Re: Wait the mixed index backend

Boxuan Li
 

Hi Toom,

Do you want to ALWAYS make sure the vertex is indexed? If so and if you happen to use Elasticsearch, you can set 
index.[X].elasticsearch.bulk-refresh=wait_for
See https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-refresh.html.

Best,
Boxuan


Wait the mixed index backend

toom@...
 

Hello,
 
The vertex that has just been created are not immediately available on mixed index (documented here [1]). 
I'm looking for a way to make sure the vertex is indexed, by waiting the mixed index backend. I think the easiest way is to request the vertex id using direct index query:
  graph.indexQuery("myIndex", "v.id:" + vertex.id())

But I didn't find a way to do that. Do you think this feature can be added? Maybe I can make a PR.
 
Regards,
 
Toom.
 
[1] https://github.com/JanusGraph/janusgraph/blob/v0.5.3/docs/index-backend/direct-index-query.md#mixed-index-availability-delay


graphml properties of properties

Laura Morales <lauretas@...>
 

Janus supports "properties of properties", ie. properties defined on other properties. How is this represented with graphml? Should I use nested elements like this

<node>
<data key="foo">
bar
<data key="xxx">yyy</data>
</data>
</node>

or should I use attributes like this?

<node>
<data key="foo" xxx="yyy">bar</data>
</node>


Re: Release vs full release?

hadoopmarc@...
 

janusgraph-full includes Gremlin Server

541 - 560 of 6656