Date
1 - 4 of 4
Is JanusGraph suitable for Ad Tech ?
Ted Wilmes <twi...@...>
This may or may not help with hotspotting, depends on your access patterns. Also, If you are expecting to make queries that have to touch all of the incident edges on a high degree vertex, as opposed to some small selective subset (by filtering on an edge property), you probably will need to run these using TinkerPop's OLAP options[1] that can be used with JanusGraph. High degree vertices (super nodes) are still a pain point for many graph engines, some just extend the upper limit of what is considered a reasonable number of edges further than others. On Friday, June 2, 2017 at 9:17:58 AM UTC-5, Jakub Liska wrote:
|
|
Jakub Liska <liska...@...>
Wow, I was looking for this whole day, thanks Ted !! It will need some serious play time but I believe that it must be used by people, especially those having these hotspots. This is exactly what makes JanusGraph useful even for absolutely graph unrelated use cases. |
|
Ted Wilmes <twi...@...>
Hi Jakub, You have a few options with JanusGraph when it comes to high degree vertices, neither of them would I consider automatic though. Commonly you'll see changes made to the graph model to introduce bucketing of entities so that edge count (partition size) can be kept within some reasonable bounds. In many ways, this isn't that different than what you were probably attempting to do in C*/ScallyDB. Janus also supports vertex and edge partitioning, in your case, you'd probably be interested in vertex partitioning [1]. This will spread a vertices incident edges around your cluster as opposed to localizing them all to one partition. Truth be told, I don't have production experience with that feature as it is relatively new. If anyone has used it, I'd be curious to hear their thoughts. This will only help you up to a point though, and it is still possible to get partitions that can lead to operational issues if you're not careful. --Ted On Thursday, June 1, 2017 at 4:41:17 PM UTC-5, Jakub Liska wrote:
|
|
Jakub Liska <liska...@...>
Hey, we've been using ScyllaDB for persisting and analyzing impressions and especially cookies, digital fingerprints and other similar internet "identities". The ultimate goal is finding relationships between these ids, so in the end you'd have many x_by_y tables, in fact, there is many2many relationship between these ids. This is a bit troublesome on columnar databases as they require designing column families with the most equally sized partitions as possible, which is a real challenge in this niche as you find yourself unable to comply to the "same partition size" rule of thumb because : 1) certain amount of internet users are not humans but machines and they are able to generate unpredictable amount of impressions 2) it's a multitenant environment where websites and campaings can be tiny or huge in respect to amount of impressions 3) it is not true time-series data where you could leverage time for proper partition sizing Now, I'm aware of the fact that Graph databases are perfect match for niches with complex graphs which this use case is not, there would be just a several types of edges and nodes, but am I right saying that I could leverage JanusGraph for the varying partition size problem? Can JanusGraph properly deal with many2many relationships ranging from 1 to 1 million and scale well at the same time? Is this taken care of under the hood at the C* / ScyllaDB level? |
|