Re: Anyone with experience of adding new Storage backend for JanusGraph ? [Help needed w.r.t SnowFlake]
Dmitry Kovalev <dk.g...@...>
toggle quoted messageShow quoted text
here are my 2 cents:
First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?
Second, you are almost correct, in that basically all you need to implement are three interfaces:
- KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened
- KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a
- KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values
Both row and column keys, and the data values are generic byte data.
Have a look at this piece of documentation: https://docs.janusgraph.org/advanced-topics/data-model/
Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that:
- KeyColumnValueStoreManager is conceptually a Map of store name -> KeyColumnValueStore,
- each KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table") ,
- each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").
In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the KeyColumnValueStoreManager implementation to create a number of specially named KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.
Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.
So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.
Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.
See for example this piece of documentation: https://docs.janusgraph.org/advanced-topics/eventual-consistency/
Hope this helps,
On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar <d.k...@...> wrote: