Date
1  6 of 6
BigTable  large rows (more than 256MB)
schwartz@...
Hi!


Boxuan Li
Hi Assaf,
Having too many vertices with the same label shouldn't be a problem. The two most possible causes are: 1) You have a super node that has too many edges. 2) You have a composite index with very low cardinality. Let's say you have 1 million vertices with the same value that is indexed by a composite index, then the index entry for that value will have 1 million rows. Let me know if you have any followups. Best, Boxuan


schwartz@...
Hi Boxuan  thanks for the quick response!
I get a feeling that 2) might be the issue. Since JanusGraph has never allowed us to index labels, we ended up having a "shadow property" which is set as a composite index to allow us to look up by labels. How does the datamodel look like for composite indexes and the resulting rows? Does it mean that for a given "label", if we have too many vertices, the value the index will resolve too is too large? Do you have any recommendation on how to approach this? Thanks, Assaf


Boxuan Li
Hi Assaf,
I see. That makes sense and unfortunately, I don't have a perfect solution. I would suggest you use a mixed index instead. Regarding the data model, you can take a look at a blog I wrote earlier: https://liboxuan.medium.com/janusgraphdeepdivepart2demystifyindexingd26e71edb386 In short, it's like storing a vertex except that now the label value itself becomes a vertex and all indexed vertices become edges to that label vertex. So yes, if you have too many vertices with the same label (usually it becomes a problem when you have millions of such vertices), then the corresponding composite index will be very large. Best, Boxuan


schwartz@...
That's a great post. This is exactly the usecase we have, with a type property.
Regarding the usage of mixed indexes   I'm less concerned with property updates in this case (as opposed to "inserts"), as the type / label of a vertex won't change.  However, I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend. I'm guessing there's no way around that? For another approach, how about cleanups / ttls / etc. ? Assuming the business model can sustain this, is there a way to compute for a specific vertex label, what will be the upper bound for the number of vertices?


Boxuan Li
> I've seen cases in the past, where queries relying on a mixed index fail while the index backend still hasn't caught up to the storage backend.
Yes that could happen. You can use https://docs.janusgraph.org/operations/recovery/#transactionfailure to tackle this problem but it also means you have an additional longrunning process to maintain. > How about cleanups / ttls / etc. ? Not sure if I understand it correctly. In your business model, are some vertices less important such that they can be deleted? If frequent cleanup / ttl means that the total number of vertices will drop significantly, then yeah that's gonna help. > what will be the upper bound for the number of vertices My empirical number is a few million vertices with the same "type" (indexed by a composite index).

