Date
1 - 6 of 6
BigTable - large rows (more than 256MB)
Boxuan Li
> I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend.
Yes that could happen. You can use https://docs.janusgraph.org/operations/recovery/#transaction-failure to tackle this problem but it also means you have an additional long-running process to maintain.
> How about cleanups / ttls / etc. ?
Not sure if I understand it correctly. In your business model, are some vertices less important such that they can be deleted? If frequent cleanup / ttl means that the total number of vertices will drop significantly, then yeah that's gonna help.
> what will be the upper bound for the number of vertices
My empirical number is a few million vertices with the same "type" (indexed by a composite index).
Yes that could happen. You can use https://docs.janusgraph.org/operations/recovery/#transaction-failure to tackle this problem but it also means you have an additional long-running process to maintain.
> How about cleanups / ttls / etc. ?
Not sure if I understand it correctly. In your business model, are some vertices less important such that they can be deleted? If frequent cleanup / ttl means that the total number of vertices will drop significantly, then yeah that's gonna help.
> what will be the upper bound for the number of vertices
My empirical number is a few million vertices with the same "type" (indexed by a composite index).
schwartz@...
That's a great post. This is exactly the use-case we have, with a type property.
Regarding the usage of mixed indexes -
- I'm less concerned with property updates in this case (as opposed to "inserts"), as the type / label of a vertex won't change.
- However, I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend. I'm guessing there's no way around that?
For another approach, how about cleanups / ttls / etc. ?
Assuming the business model can sustain this, is there a way to compute for a specific vertex label, what will be the upper bound for the number of vertices?
Regarding the usage of mixed indexes -
- I'm less concerned with property updates in this case (as opposed to "inserts"), as the type / label of a vertex won't change.
- However, I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend. I'm guessing there's no way around that?
For another approach, how about cleanups / ttls / etc. ?
Assuming the business model can sustain this, is there a way to compute for a specific vertex label, what will be the upper bound for the number of vertices?
Boxuan Li
Hi Assaf,
I see. That makes sense and unfortunately, I don't have a perfect solution. I would suggest you use a mixed index instead.
Regarding the data model, you can take a look at a blog I wrote earlier: https://li-boxuan.medium.com/janusgraph-deep-dive-part-2-demystify-indexing-d26e71edb386 In short, it's like storing a vertex except that now the label value itself becomes a vertex and all indexed vertices become edges to that label vertex. So yes, if you have too many vertices with the same label (usually it becomes a problem when you have millions of such vertices), then the corresponding composite index will be very large.
Best,
Boxuan
I see. That makes sense and unfortunately, I don't have a perfect solution. I would suggest you use a mixed index instead.
Regarding the data model, you can take a look at a blog I wrote earlier: https://li-boxuan.medium.com/janusgraph-deep-dive-part-2-demystify-indexing-d26e71edb386 In short, it's like storing a vertex except that now the label value itself becomes a vertex and all indexed vertices become edges to that label vertex. So yes, if you have too many vertices with the same label (usually it becomes a problem when you have millions of such vertices), then the corresponding composite index will be very large.
Best,
Boxuan
schwartz@...
Hi Boxuan - thanks for the quick response!
I get a feeling that 2) might be the issue. Since JanusGraph has never allowed us to index labels, we ended up having a "shadow property" which is set as a composite index to allow us to look up by labels.
How does the data-model look like for composite indexes and the resulting rows? Does it mean that for a given "label", if we have too many vertices, the value the index will resolve too is too large?
Do you have any recommendation on how to approach this?
Thanks,
Assaf
I get a feeling that 2) might be the issue. Since JanusGraph has never allowed us to index labels, we ended up having a "shadow property" which is set as a composite index to allow us to look up by labels.
How does the data-model look like for composite indexes and the resulting rows? Does it mean that for a given "label", if we have too many vertices, the value the index will resolve too is too large?
Do you have any recommendation on how to approach this?
Thanks,
Assaf
Boxuan Li
Hi Assaf,
Having too many vertices with the same label shouldn't be a problem. The two most possible causes are:
1) You have a super node that has too many edges.
2) You have a composite index with very low cardinality. Let's say you have 1 million vertices with the same value that is indexed by a composite index, then the index entry for that value will have 1 million rows.
Let me know if you have any follow-ups.
Best,
Boxuan
Having too many vertices with the same label shouldn't be a problem. The two most possible causes are:
1) You have a super node that has too many edges.
2) You have a composite index with very low cardinality. Let's say you have 1 million vertices with the same value that is indexed by a composite index, then the index entry for that value will have 1 million rows.
Let me know if you have any follow-ups.
Best,
Boxuan
schwartz@...
Hi!
We are using JG on top of Bigtable. While trying to understand some slow queries, I found the following in the Bigtable query vizualizer: "Large rows — Found 1 single key storing more than 256MB."
Not sure I fully understand the data model, does this mean that we have a single vertex which is very large (maybe a supernode)? Or does it mean we have too many vertices with a given label?
How does one begin to understand how to approach this?
Many thanks in advance,
Assaf