Redundant data store when using BekeleyDB as he storage backend?
I am trying to figure out the storage format of JanusGraph. When loading a very simple graph with BerkeleyDB as the storage backend, I found something very confusing which I may need your help to clarify...
1. An edge (with an edge label, but no property) is stored in multiple key-value records, where only the last few bits of the key are different, and the value file is empty.
2. For an edge with three properties, three key-value records are inserted into BerkeleyDB. I found the value fields of these records are getting longer and longer. It turns out that first record only stores the first property, the second record stores the first two properties, and the third records stores all the three properties.
My questions are, (1) are these observations correct? (2) if they are correct, what are the ideas behind these designs, and how are they used?
Thanks in advance ;-)
Dmitry Kovalev <dk.g...@...>
toggle quoted messageShow quoted text
I don't think anybody here will be able to confirm specifically if your observations are correct or not without a specific reproducible example.
However I think I have a possible explanation of why you are seeing more data stored than you expect. When you commit a graph transaction adding and edge or a vertex, Janusgraph may actually write to multiple backend stores, depending on configuration:
- vertex store (the actual data)
- index store (update the index for searching, essentially duplicating the value of indexed property)
- transaction log store (can be used for replication etc)
- id block store to manage id block allocation
Some of these stores (e.g. transaction log), as you may guess from the names, are write-only i.e. they only can grow.
On Monday, 16 December 2019 13:09:56 UTC, br...@... wrote: