Re: Is JanusGraph suitable for Ad Tech ?

Ted Wilmes <twi...@...>

Hi Jakub,
You have a few options with JanusGraph when it comes to high degree vertices, neither of them would I consider automatic though.
Commonly you'll see changes made to the graph model to introduce bucketing of entities so that edge count (partition size) can be kept within 
some reasonable bounds. In many ways, this isn't that different than what you were probably attempting to do in C*/ScallyDB. Janus also supports
vertex and edge partitioning, in your case, you'd probably be interested in vertex partitioning [1]. This will spread a vertices incident edges around your
cluster as opposed to localizing them all to one partition. Truth be told, I don't have production experience with that feature as it is relatively new.
If anyone has used it, I'd be curious to hear their thoughts. This will only help you up to a point though, and it is still possible to get partitions that
can lead to operational issues if you're not careful.


On Thursday, June 1, 2017 at 4:41:17 PM UTC-5, Jakub Liska wrote:

we've been using ScyllaDB for persisting and analyzing impressions and especially cookies, digital fingerprints and other similar internet "identities".

The ultimate goal is finding relationships between these ids, so in the end you'd have many x_by_y tables, in fact, there is many2many relationship between these ids.

This is a bit troublesome on columnar databases as they require designing column families with the most equally sized partitions
as possible, which is a real challenge in this niche as you find yourself unable to comply to the "same partition size" rule of thumb because :
 1) certain amount of internet users are not humans but machines  and they are able to generate unpredictable amount of impressions 
 2) it's a multitenant environment where websites and campaings can be tiny or huge in respect to amount of impressions
 3) it is not true time-series data where you could leverage time for proper partition sizing

Now, I'm aware of the fact that Graph databases are perfect match for niches with complex graphs which this use case is not,
there would be just a several types of edges and nodes, but am I right saying that I could leverage JanusGraph for the varying partition size problem?

Can JanusGraph properly deal with many2many relationships ranging from 1 to 1 million and scale well at the same time?
Is this taken care of under the hood at the C* / ScyllaDB  level? 

Join to automatically receive all group messages.