- Performance Improvement
Re: Performance Improvement
toggle quoted messageShow quoted text
Thank you for the detailed explanation, regarding the configuration and indexes. I will dig deeper into it and try to resolve the problem.
But I think the queries which I am executing are not efficient.
Request you to share the gremlin queries for the above two cases mentioned, in the previous mail. That will help a lot to validate the queries.
Thanks & Regards,
I didn't follow your statements about count but I just want to add that if you don't use mixed index for count query than your count will require iteratively returning each element and count them in-memory (i.e. very inefficient). To check if your count query is using mixed index you can use `profile()` step.
I also noticed that you say that you need to return all properties for all vertices / edges. If so, you may consider using multiQuery which will return properties for your vertices faster than valueMap() step in certain cases. The only thing you need to consider when using `multiQuery` (actually any query) is tx-cache size (don't confuse with database cache). In case your tx-cache size is too small to hold all the vertices than some vertices' properties will be evicted from the cache. Thus, when you will try to return values for the vertex properties it will make new database calls to retrieve those properties. In the worst case all your access to properties may lead to separate database calls. To eliminate this downside you need to make sure that your transaction cache size is at least the same amount of vertices your are accessing (or bigger). In such case `multiQuery().addAllVertices(yourVertices).properties()` will return all properties for all vertices and it will hold those properties in-memory instead of evicting them.
Moreover, it looks like your use cases are read-heavy and not write-heavy. You may improve your performance by making sure all your writes are using consistency-level=ALL and all your reads are using consistency-level=ONE. You may want to disable consistency check as well as internal / external checks for your transactions if you are sure about your data. It will make some of your queries faster but less safer.
You also need to make sure that you configured your CQL driver throughput optimally for your load. In case your JanusGraph is embedded into your Application you need to make sure your application has the smallest latency between your Cassandra nodes (you may even consider placing your application to the same nodes with Cassandra or just moving your Gremlin Server to those nodes and use remote connection).
There are many JanusGraph and CQL driver configurations which you may use to tune your performance for your use-case. This topic is to broad to give all-fits solution. Different use cases might need different approaches. I would strongly recommend you to explore all JanusGraph configurations here: https://docs.janusgraph.org/configs/configuration-reference/ . It will allow to configure your general JanusGraph configuration, your transactions configuration, and your CQL driver configuration much better if you are aware about all the configurations. For advanced CQL configuration options see this configurations here: https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/configuration/reference/ (storage.cql.internal in JanusGraph).
You may also try exploring other storage backends which may give you smaller latency (hence better performance), like ScyllaDB, Aerospike, etc.
Join email@example.com to automatically receive all group messages.