Date
1 - 6 of 6
BigTable - large rows (more than 256MB)
schwartz@...
Hi! |
|
Boxuan Li
Hi Assaf,
Having too many vertices with the same label shouldn't be a problem. The two most possible causes are: 1) You have a super node that has too many edges. 2) You have a composite index with very low cardinality. Let's say you have 1 million vertices with the same value that is indexed by a composite index, then the index entry for that value will have 1 million rows. Let me know if you have any follow-ups. Best, Boxuan |
|
schwartz@...
Hi Boxuan - thanks for the quick response!
I get a feeling that 2) might be the issue. Since JanusGraph has never allowed us to index labels, we ended up having a "shadow property" which is set as a composite index to allow us to look up by labels. How does the data-model look like for composite indexes and the resulting rows? Does it mean that for a given "label", if we have too many vertices, the value the index will resolve too is too large? Do you have any recommendation on how to approach this? Thanks, Assaf |
|
Boxuan Li
Hi Assaf,
I see. That makes sense and unfortunately, I don't have a perfect solution. I would suggest you use a mixed index instead. Regarding the data model, you can take a look at a blog I wrote earlier: https://li-boxuan.medium.com/janusgraph-deep-dive-part-2-demystify-indexing-d26e71edb386 In short, it's like storing a vertex except that now the label value itself becomes a vertex and all indexed vertices become edges to that label vertex. So yes, if you have too many vertices with the same label (usually it becomes a problem when you have millions of such vertices), then the corresponding composite index will be very large. Best, Boxuan |
|
schwartz@...
That's a great post. This is exactly the use-case we have, with a type property.
Regarding the usage of mixed indexes - - I'm less concerned with property updates in this case (as opposed to "inserts"), as the type / label of a vertex won't change. - However, I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend. I'm guessing there's no way around that? For another approach, how about cleanups / ttls / etc. ? Assuming the business model can sustain this, is there a way to compute for a specific vertex label, what will be the upper bound for the number of vertices? |
|
Boxuan Li
> I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend.
Yes that could happen. You can use https://docs.janusgraph.org/operations/recovery/#transaction-failure to tackle this problem but it also means you have an additional long-running process to maintain. > How about cleanups / ttls / etc. ? Not sure if I understand it correctly. In your business model, are some vertices less important such that they can be deleted? If frequent cleanup / ttl means that the total number of vertices will drop significantly, then yeah that's gonna help. > what will be the upper bound for the number of vertices My empirical number is a few million vertices with the same "type" (indexed by a composite index). |
|