Searching with composite indexes - performance gradually decreases
Dilan Ranasinghe <dila...@...>
I'm doing a POC on janusgraph and in my scenario all the vertices have a unique id, and i used it for indexing
k = m.makePropertyKey(customId).dataType(String.class).make()
The program i used for the poc does graph insertion by one thread, while searching and updating happens in another thread constantly.
Searching and updating is done as follows.
Searching : g.V().has("customId","requiredCustomId")
What i noticed is that at the start of the program, search queries were served around 12ms and after inserting around 30 million nodes it was reduced to 20ms.
For the update queries this was worse and changed from 12ms to 23ms.
Can this happen due to the growth of the index store since i have indexed all the 30million nodes?
Is it a good idea to index all the vertices in a graph?
If i'm doing only the searches with perfect match for the indexed values, do i need to use an indexing back-end like solr?
Ted Wilmes <twi...@...>
toggle quoted messageShow quoted text
What backing store are you using? Also are you running Janus embedded or does your client hit gremlin server remotely? I'll assume you're using Cassandra just to give you a few things to check but these should be generally applicable. First I'd see where the slowdown is coming from. My guess would be the Janus IO with its backing store but you never know, it could be as simple as bumping up a heap size on your client. Here are a few things to watch as your run your load:
* Client (and janus server if it's running separate from the client) & Cassandra GC stats (GC counts & latencies)
* Cassandra (or other backing store) read & write latencies
If you see the increase coming from the backing store I'd recommend following the tuning tips appropriate for your backing store to see if you can stabilize that performance to a point you feel comfortable. Also, if this is a POC, be sure to test on hardware that is representative of what you'd actually be using in production. If it's just your laptop, I'd take any performance learnings with a grain of salt. A single laptop setup is not a good proxy for production performance against a distributed backend.
On Friday, September 29, 2017 at 2:14:08 AM UTC-5, Dilan Ranasinghe wrote:
Andy Davidoff <an...@...>
On Sep 29, 2017, at 1:14 AM, Dilan Ranasinghe <dila...@...> wrote:
If i'm doing only the searches with perfect match for the indexed values, do i need to use an indexing back-end like solr?No, and for your use-case, a mixed index will result in worse performance.