I know that JanusGraph needs acolumn-familytype nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store completeJSON Oriented Column family datahere after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding fromjanusgraph-cassandraandjanusgraph-berkleyprojects. Please correct me if I'm wrong in my understanding.
We need to have aStoreManagerclass likeHBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManagerwhich extends eitherDistributedStoreManager or LocalStoreManagerand implementsKeyColumnValueStoreManagerclass right? These class needs to have buildfeaturesobject which is more or less like storage connection configuration. They need to have abeginTransactionmethod which creates the actual connection to corresponding storage backend. Is that correct?
You will need to have correspondingTransactionclasses which create the transaction to corresponding backend like*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created inBerkeleyJETxI don't see something similar forCassandraTransaction. So am I missing something in my undesrtanding here?
You need to haveKeyColumnValueStoreclass for backend. Like*AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore*etc. They need to extendKeyColumnValueStore. This class takes care of massaging the data intoKeyColumnFormatso that they can then be inserted into corresponding table inside Storage Backend.
So question to my mind are, what will be structure of those classes?
Are there somemethodswhich needs to be present always like I seegetSlice()being used across in all classes. Also, how do they work?
Do they justconvert incoming gremlin queries into KeyColumnValue structure?
Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do theread aspect of janusgraph/query aspectgets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.