Re: Anyone with experience of adding new Storage backend for JanusGraph ? [Help needed w.r.t SnowFlake]

Debasish Kanhar <d.k...@...>

Hi Dimitriy,

Sorry about the late response. I was working on this project part time only till last week when we moved into full time dev for this PoC. Really thanks to your pointers and Jason's that we have been able to start with the development works and we have some ground work to start with :-)

So,we are modelling SnowFlake (Which is like SQL File store) as a Key-Value store by creating two columns namely "Key" and "Value" in each tables. We are going to define the data type as binary here (Or Stringified Binary) so that arbitrary data can be dumped (I feel its of type StaticBuffer Key and StaticBuffer value. Is that correct? )

Since, we are modelling SnowFlake as Key-Value store, it makes sense to have a SnowFlakeManager class implement OrderedKeyValueStore like for BerkleyJE? Is that correct understanding?

Updates are that we have almost finished development of SnowFlakeManager class. The required methods needed are implemented like beginTransaction, openDatabase though one particular function not done is mutateMany is not done, but it will be done as it in turn calls KeyValueStore.insert() method.

Also, a lot of basic functions in KeyValueStore is also done like insert (Insert binary key-value), get (Get from binary key), delete (Delete a row using binary key). We are kinda stuck at the function getSlice(). What does it do?

We are kinda wondering how getSlice operates? I know that the function is used when querying Janusgraph for gremlin queries (Read operations) ( . We see that a sliceQuery is generated which is then executed againt backend to get results.
Now, my question here is that, slice query is used while queryingfor properties for vertices (edges/properties) by slicing the relations of vertex and slicing them based on filters/conditions. The following steps are followed in getSlice function (BerkleyKeyValueStore - berkleydb & ColumnValueStore - inmemory) :
  1. Find the row from the passed key. (Returns a Binary value against the binary key)
  2. Fetch slice bounderies, i.e. slice start and end from query passed
  3. Apply the slice boundries on the returned value in 1st step else, fetch the first results (pt 1) by applying the slicing conditions in step
My question is related to last step. Since my data in DB is just "Binary Key-Binary Value", how can we apply another constraints (slice conditions) in query? It just doesn't have any additional meta data to apply slice on as I just have 2 columns in my table.

Hope my explaination was clear for you to understand. I want to know primarily how the last step would work in the data model I described above (Having 2 columns, one for Key and other for Value. And each of stringified binary data type). And, is the data model selected good enough?

Thanks in advance. And I promise this time my replies will be quicker :-)

On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry Kovalev wrote:
Hi Debashish,

here are my 2 cents:

First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?

Second, you are almost correct, in that basically all you need to implement are three interfaces:
- KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened
-  KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a
- KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values

Both row and column keys, and the data values are generic byte data.

Have a look at this piece of documentation:    

Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that:  
- KeyColumnValueStoreManager is conceptually a Map of store name ->  KeyColumnValueStore, 
- each  KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table") ,
- each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").

In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the KeyColumnValueStoreManager implementation to create a number of specially named KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.

Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.

So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.

Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.

See for example this piece of documentation:

Hope this helps,

On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar <d...@...> wrote:
I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?

If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.

  1. We need to have a StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManagerand implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct?
  2. You will need to have corresponding Transaction classes which create the transaction to corresponding backend like *CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction. So am I missing something in my undesrtanding here?
  3. You need to have KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend.
    1. So question to my mind are, what will be structure of those classes?
    2. Are there some methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
    3. Do they just convert incoming gremlin queries into KeyColumnValue structure?
    4. Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
    5. Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
  4. And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing"  and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.

I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.

Any help will be really appreciated.

Thanks in Advance.

You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Join to automatically receive all group messages.