Debasish Kanhar <d.k...@...>
I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.
- We need to have a
StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManager and implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct? - You will need to have corresponding Transaction classes which create the transaction to corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction . So am I missing something in my undesrtanding here? - You need to have
KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend. - So question to my mind are, what will be structure of those classes?
- Are there some
methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
- Do they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
- Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the
read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
- And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.
Any help will be really appreciated.
Thanks in Advance.
|
|
Dmitry Kovalev <dk.g...@...>
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?
Second, you are almost correct, in that basically all you need to implement are three interfaces: - KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened -
KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a - KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values
Both row and column keys, and the data values are generic byte data.
Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that: - KeyColumnValueStoreManager is conceptually a Map of store name ->
KeyColumnValueStore, - each
KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table")
, - each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").
In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to create a number of specially named
KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.
Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.
So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.
Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.
Hope this helps, Dmitry
toggle quoted message
Show quoted text
On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar < d.k...@...> wrote: I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.
- We need to have a
StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManager and implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct? - You will need to have corresponding Transaction classes which create the transaction to corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction . So am I missing something in my undesrtanding here? - You need to have
KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend. - So question to my mind are, what will be structure of those classes?
- Are there some
methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
- Do they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
- Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the
read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
- And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.
Any help will be really appreciated.
Thanks in Advance.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
|
|
Dmitry Kovalev <dk.g...@...>
One more useful pointer - CQLKeyColumnValueStore is a nice and compact example of how the discussed KCVS interface can be implemented on top of Cassandra, using prepared CQL statements
toggle quoted message
Show quoted text
On Thu, 24 Oct 2019 at 22:47, Dmitry Kovalev < dk.g...@...> wrote: Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?
Second, you are almost correct, in that basically all you need to implement are three interfaces: - KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened -
KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a - KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values
Both row and column keys, and the data values are generic byte data.
Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that: - KeyColumnValueStoreManager is conceptually a Map of store name ->
KeyColumnValueStore, - each
KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table")
, - each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").
In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to create a number of specially named
KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.
Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.
So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.
Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.
Hope this helps, Dmitry On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar < d.k...@...> wrote: I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.
- We need to have a
StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManager and implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct? - You will need to have corresponding Transaction classes which create the transaction to corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction . So am I missing something in my undesrtanding here? - You need to have
KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend. - So question to my mind are, what will be structure of those classes?
- Are there some
methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
- Do they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
- Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the
read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
- And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.
Any help will be really appreciated.
Thanks in Advance.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
|
|
Debasish Kanhar <d.k...@...>
Hi Dimitriy,
Sorry about the late response. I was working on this project part time only till last week when we moved into full time dev for this PoC. Really thanks to your pointers and Jason's that we have been able to start with the development works and we have some ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL File store) as a Key-Value store by creating two columns namely "Key" and "Value" in each tables. We are going to define the data type as binary here (Or Stringified Binary) so that arbitrary data can be dumped (I feel its of type StaticBuffer Key and StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value store, it makes sense to have a SnowFlakeManager class implement OrderedKeyValueStore like for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished development of SnowFlakeManager class. The required methods needed are implemented like beginTransaction, openDatabase though one particular function not done is mutateMany is not done, but it will be done as it in turn calls KeyValueStore.insert() method.
Also, a lot of basic functions in KeyValueStore is also done like insert (Insert binary key-value), get (Get from binary key), delete (Delete a row using binary key). We are kinda stuck at the function getSlice(). What does it do?
We are kinda wondering how getSlice operates? I know that the function is used when querying Janusgraph for gremlin queries (Read operations) (https://github.com/BillBaird/delftswa-aurelius-titan/blob/master/SA-doc/Operations.md#queries) . We see that a sliceQuery is generated which is then executed againt backend to get results.
Now, my question here is that, slice query is used while queryingfor properties for vertices (edges/properties) by slicing the relations of vertex and slicing them based on filters/conditions. The following steps are followed in getSlice function (BerkleyKeyValueStore - berkleydb & ColumnValueStore - inmemory) : - Find the row from the passed key. (Returns a Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and end from query passed
- Apply the slice boundries on the returned value in 1st step else, fetch the first results (pt 1) by applying the slicing conditions in step
My question is related to last step. Since my data in DB is just "Binary Key-Binary Value", how can we apply another constraints (slice conditions) in query? It just doesn't have any additional meta data to apply slice on as I just have 2 columns in my table.
Hope my explaination was clear for you to understand. I want to know primarily how the last step would work in the data model I described above (Having 2 columns, one for Key and other for Value. And each of stringified binary data type). And, is the data model selected good enough?
Thanks in advance. And I promise this time my replies will be quicker :-)
toggle quoted message
Show quoted text
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry Kovalev wrote: Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?
Second, you are almost correct, in that basically all you need to implement are three interfaces: - KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened -
KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a - KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values
Both row and column keys, and the data values are generic byte data.
Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that: - KeyColumnValueStoreManager is conceptually a Map of store name ->
KeyColumnValueStore, - each
KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table")
, - each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").
In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to create a number of specially named
KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.
Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.
So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.
Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.
Hope this helps, Dmitry On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar < d...@...> wrote: I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.
- We need to have a
StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManager and implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct? - You will need to have corresponding Transaction classes which create the transaction to corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction . So am I missing something in my undesrtanding here? - You need to have
KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend. - So question to my mind are, what will be structure of those classes?
- Are there some
methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
- Do they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
- Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the
read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
- And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.
Any help will be really appreciated.
Thanks in Advance.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
|
|
Ryan Stauffer <ry...@...>
Debasish,
This sounds like an interesting project, but I do have a question about your choice of Snowflake. If I missed your response to this in the email chain, I apologize, but what problems with the existing high-performance backends (Scylla, for instance) are you trying to solve with Snowflake? The answer to that would probably inform your specific implementation over Snowflake.
Thanks, Ryan
toggle quoted message
Show quoted text
On Wed, Nov 27, 2019 at 3:18 AM Debasish Kanhar < d.k...@...> wrote: Hi Dimitriy,
Sorry about the late response. I was working on this project part time only till last week when we moved into full time dev for this PoC. Really thanks to your pointers and Jason's that we have been able to start with the development works and we have some ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL File store) as a Key-Value store by creating two columns namely "Key" and "Value" in each tables. We are going to define the data type as binary here (Or Stringified Binary) so that arbitrary data can be dumped (I feel its of type StaticBuffer Key and StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value store, it makes sense to have a SnowFlakeManager class implement OrderedKeyValueStore like for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished development of SnowFlakeManager class. The required methods needed are implemented like beginTransaction, openDatabase though one particular function not done is mutateMany is not done, but it will be done as it in turn calls KeyValueStore.insert() method.
Also, a lot of basic functions in KeyValueStore is also done like insert (Insert binary key-value), get (Get from binary key), delete (Delete a row using binary key). We are kinda stuck at the function getSlice(). What does it do?
Now, my question here is that, slice query is used while queryingfor properties for vertices (edges/properties) by slicing the relations of vertex and slicing them based on filters/conditions. The following steps are followed in getSlice function (BerkleyKeyValueStore - berkleydb & ColumnValueStore - inmemory) : - Find the row from the passed key. (Returns a Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and end from query passed
- Apply the slice boundries on the returned value in 1st step else, fetch the first results (pt 1) by applying the slicing conditions in step
My question is related to last step. Since my data in DB is just "Binary Key-Binary Value", how can we apply another constraints (slice conditions) in query? It just doesn't have any additional meta data to apply slice on as I just have 2 columns in my table.
Hope my explaination was clear for you to understand. I want to know primarily how the last step would work in the data model I described above (Having 2 columns, one for Key and other for Value. And each of stringified binary data type). And, is the data model selected good enough?
Thanks in advance. And I promise this time my replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry Kovalev wrote: Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?
Second, you are almost correct, in that basically all you need to implement are three interfaces: - KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened -
KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a - KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values
Both row and column keys, and the data values are generic byte data.
Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that: - KeyColumnValueStoreManager is conceptually a Map of store name ->
KeyColumnValueStore, - each
KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table")
, - each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").
In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to create a number of specially named
KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.
Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.
So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.
Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.
Hope this helps, Dmitry On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar < d...@...> wrote: I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.
- We need to have a
StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManager and implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct? - You will need to have corresponding Transaction classes which create the transaction to corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction . So am I missing something in my undesrtanding here? - You need to have
KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend. - So question to my mind are, what will be structure of those classes?
- Are there some
methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
- Do they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
- Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the
read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
- And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.
Any help will be really appreciated.
Thanks in Advance.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

|
|
Debasish Kanhar <d.k...@...>
Hi Ryan.
Well that's a very valid question you asked. The current implementation of backends like Scylla as you mentioned are really highly performant. There is no specific problem in mind, but off late I have been dealing with a lot of clients who are migrating their whole system into SnowFlake, including the whole Data storage and Analytics components as well. SnowFlake is a hot upcoming Data storage and warehousing system.
Those clients are really reluctant to add another storage component to their application. Reasons can be a lot like due to high costs, or added complexity of their architecture, or duplication of data across storages. But at the same time these clients also want to incorporate Graph Databases and Graph Analytics into their application as well. This integration is targeted for those set of customers/clients who are/have migrating/migrated into SnowFlake and want to have Graph based component as well. For now, its simply not possible for them to have JanusGraph with their SnowFlake data storages.
Hope I was able to explain it clearly :-)
toggle quoted message
Show quoted text
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer wrote: Debasish,
This sounds like an interesting project, but I do have a question about your choice of Snowflake. If I missed your response to this in the email chain, I apologize, but what problems with the existing high-performance backends (Scylla, for instance) are you trying to solve with Snowflake? The answer to that would probably inform your specific implementation over Snowflake.
Thanks, Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish Kanhar < d...@...> wrote: Hi Dimitriy,
Sorry about the late response. I was working on this project part time only till last week when we moved into full time dev for this PoC. Really thanks to your pointers and Jason's that we have been able to start with the development works and we have some ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL File store) as a Key-Value store by creating two columns namely "Key" and "Value" in each tables. We are going to define the data type as binary here (Or Stringified Binary) so that arbitrary data can be dumped (I feel its of type StaticBuffer Key and StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value store, it makes sense to have a SnowFlakeManager class implement OrderedKeyValueStore like for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished development of SnowFlakeManager class. The required methods needed are implemented like beginTransaction, openDatabase though one particular function not done is mutateMany is not done, but it will be done as it in turn calls KeyValueStore.insert() method.
Also, a lot of basic functions in KeyValueStore is also done like insert (Insert binary key-value), get (Get from binary key), delete (Delete a row using binary key). We are kinda stuck at the function getSlice(). What does it do?
Now, my question here is that, slice query is used while queryingfor properties for vertices (edges/properties) by slicing the relations of vertex and slicing them based on filters/conditions. The following steps are followed in getSlice function (BerkleyKeyValueStore - berkleydb & ColumnValueStore - inmemory) : - Find the row from the passed key. (Returns a Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and end from query passed
- Apply the slice boundries on the returned value in 1st step else, fetch the first results (pt 1) by applying the slicing conditions in step
My question is related to last step. Since my data in DB is just "Binary Key-Binary Value", how can we apply another constraints (slice conditions) in query? It just doesn't have any additional meta data to apply slice on as I just have 2 columns in my table.
Hope my explaination was clear for you to understand. I want to know primarily how the last step would work in the data model I described above (Having 2 columns, one for Key and other for Value. And each of stringified binary data type). And, is the data model selected good enough?
Thanks in advance. And I promise this time my replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry Kovalev wrote: Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?
Second, you are almost correct, in that basically all you need to implement are three interfaces: - KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened -
KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a - KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values
Both row and column keys, and the data values are generic byte data.
Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that: - KeyColumnValueStoreManager is conceptually a Map of store name ->
KeyColumnValueStore, - each
KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table")
, - each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").
In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to create a number of specially named
KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.
Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.
So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.
Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.
Hope this helps, Dmitry On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar < d...@...> wrote: I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?
If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.
- We need to have a
StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManager and implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct? - You will need to have corresponding Transaction classes which create the transaction to corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction . So am I missing something in my undesrtanding here? - You need to have
KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend. - So question to my mind are, what will be structure of those classes?
- Are there some
methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
- Do they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
- Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the
read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
- And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing" and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.
I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.
Any help will be really appreciated.
Thanks in Advance.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

|
|
Evgeniy Ignatiev <yevgeniy...@...>
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
toggle quoted message
Show quoted text
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
|
|
Debasish Kanhar <d.k...@...>
Hi Evgeniy,
Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)
toggle quoted message
Show quoted text
On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
|
|
Debasish Kanhar <d.k...@...>
Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses prepared statements, and hence that would be appropriate for me in sense that, I'll be using SQL statements (SnowSQL) for SnowFlake querying using a DAO. Thus CQL and SnowFlake adopter I'm building would be similar and hence makes sense to reference out of those.
As mentioned before, I'm currently blocked at the method getSlice. I know that the method is used while querying the data, but I'm unable to get my head around how does it work internally. A blind implementation might work, but it won't give me an understanding how its working internally. If anyone can help me understand how it works, a similar implementation for SnowFlake becomes easier then.
As mentioned before I'm basing my understanding from CQL adopter. If we look at CQLKeyColumnValueStore under getSlice method, it makes use of this.getSlice prepared statement to fulfill query. The this.getSlice is as follows:
this.getSlice = this.session.prepare(select() .column(COLUMN_COLUMN_NAME) .column(VALUE_COLUMN_NAME) .fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME) .fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME) .from(this.storeManager.getKeyspaceName(), this.tableName) .where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))) .and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING))) .and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING))) .limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in the method public EntryList getSlice() which uses the prepared statement above to execute some query. When the following happens (Contents of getSlice method)
final Future<EntryList> result = Future.fromJavaFuture( this.executorService, this.session.executeAsync(this.getSlice.bind() .setBytes(KEY_BINDING, query.getKey().asByteBuffer()) .setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer()) .setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer()) .setInt(LIMIT_BINDING, query.getLimit()) .setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel()))) .map(resultSet -> fromResultSet(resultSet, this.getter)); interruptibleWait(result);
Is following understanding correct? Anyone with JanusGraph and Cassandra expertise can help.
I'm updating the base query from following bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer())) .and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer())) .and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer())) .limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS (SnowFlake for eg though SnowFlake isn't RDBMS, it is similar in terms of storage and query engine) with 3 columns as (key, value, column1) of datatypes string (varchar with binary info) can something similar query be correct?
SELECT .... FROM keyspace WHERE key = query.getKey().asByteBuffer() and column1 >= query.getSliceStart().asByteBuffer() and column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms of what is targeted to achieve? If I can understand the actual meaning of the prepared statements here, I can also base my undertandings for rest of methods which would be required for doing mutations in underlaying backend.
Any help is really appreciated as we are kinda getting tighter and tighter on deadline regarding the feasibility PoC of SnowFlake as backend for JanusGraph.
Thanks in advance
toggle quoted message
Show quoted text
On Thursday, 28 November 2019 21:05:09 UTC+5:30, Debasish Kanhar wrote: Hi Evgeniy,
Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
|
|
Dmitry Kovalev <dk.g...@...>
Hi Debashish,
in terms of wrapping one's head around what getSlice() method does - conceptually it is not hard to understand, if you peruse the link I have referred you to in my original reply:
The relevant part of it is really short so I'll just copy it here (with added emphasis in bold):
===quote=== Bigtable Data Model
Under the Bigtable data model each table is a collection of rows. Each row is uniquely identified by a key. Each row is comprised of an arbitrary (large, but limited) number of cells. A cell is composed of a column and value. A cell is uniquely identified by a column within a given row. Rows in the Bigtable model are called "wide rows" because they support a large number of cells and the columns of those cells don’t have to be defined up front as is required in relational databases. JanusGraph has an additional requirement for the Bigtable data model: The cells must be sorted by their columns and a subset of the cells specified by a column range must be efficiently retrievable (e.g. by using index structures, skip lists, or binary search). ===/quote=== Basically, getSlice method is the formal representation of above requirement in bold: based on the order defined for "column keys" space, it should return all "columns" whose keys lay "between" a start and end key values, given in SliceQuery... that is, >= start and <=end... Please refer to the javadoc for more detail. However, answering the question of how do you effectively implement it in your backend is pretty much the crux of your potential contribution. If the underlying DB's data model more or less "natively" supports the above (as e.g. in the case of Cassandra, BDB etc), then it becomes relatively easy. If the underlying data model is different, then it gets us back to the question which has been asked a couple of times in this thread - i.e. whether it is actually feasible and/or desirable to try and implement it? For example, in order to implement it in a "classical" RDBMS, your would have to find one which supports ordering and indexing of byte columns/blobs, and then probably encounter scalability issues if you chose to model the whole key-column-value store as one table with row key, column key and data... It might still be possible to address these issues and implement it reasonably effectively, but it is unclear what would be the point - as you would effectively have to circumvent the "relational/SQL" top abstraction layer, which is the whole point of RDBMS, to get back to lower level implementation details. Unfortunately I know nothing about Snowflake and it's data model, and don't have the time to learn about it in any sufficient detail any time soon, so I cannot really advise you neither on feasibility nor on any implementation details.
Hope this helps,
Dmitry
toggle quoted message
Show quoted text
On Sun, 1 Dec 2019 at 09:04, Debasish Kanhar < d.k...@...> wrote: Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses prepared statements, and hence that would be appropriate for me in sense that, I'll be using SQL statements (SnowSQL) for SnowFlake querying using a DAO. Thus CQL and SnowFlake adopter I'm building would be similar and hence makes sense to reference out of those.
As mentioned before, I'm currently blocked at the method getSlice. I know that the method is used while querying the data, but I'm unable to get my head around how does it work internally. A blind implementation might work, but it won't give me an understanding how its working internally. If anyone can help me understand how it works, a similar implementation for SnowFlake becomes easier then.
As mentioned before I'm basing my understanding from CQL adopter. If we look at CQLKeyColumnValueStore under getSlice method, it makes use of this.getSlice prepared statement to fulfill query. The this.getSlice is as follows:
this.getSlice = this.session.prepare(select() .column(COLUMN_COLUMN_NAME) .column(VALUE_COLUMN_NAME) .fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME) .fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME) .from(this.storeManager.getKeyspaceName(), this.tableName) .where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))) .and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING))) .and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING))) .limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in the method public EntryList getSlice() which uses the prepared statement above to execute some query. When the following happens (Contents of getSlice method)
final Future<EntryList> result = Future.fromJavaFuture( this.executorService, this.session.executeAsync(this.getSlice.bind() .setBytes(KEY_BINDING, query.getKey().asByteBuffer()) .setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer()) .setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer()) .setInt(LIMIT_BINDING, query.getLimit()) .setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel()))) .map(resultSet -> fromResultSet(resultSet, this.getter)); interruptibleWait(result);
Is following understanding correct? Anyone with JanusGraph and Cassandra expertise can help.
I'm updating the base query from following bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer())) .and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer())) .and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer())) .limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS (SnowFlake for eg though SnowFlake isn't RDBMS, it is similar in terms of storage and query engine) with 3 columns as (key, value, column1) of datatypes string (varchar with binary info) can something similar query be correct?
SELECT .... FROM keyspace WHERE key = query.getKey().asByteBuffer() and column1 >= query.getSliceStart().asByteBuffer() and column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms of what is targeted to achieve? If I can understand the actual meaning of the prepared statements here, I can also base my undertandings for rest of methods which would be required for doing mutations in underlaying backend.
Any help is really appreciated as we are kinda getting tighter and tighter on deadline regarding the feasibility PoC of SnowFlake as backend for JanusGraph.
Thanks in advance
On Thursday, 28 November 2019 21:05:09 UTC+5:30, Debasish Kanhar wrote: Hi Evgeniy,
Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
|
|
Dmitry Kovalev <dk.g...@...>
A few more things which you might find helpful:
1) To help you verify if your implementation is correct, you can utilise the standard test suite which is used to test the correctness of all backends - more or less all you need to do is extend KeyColumnValueStoreTest and override the setup methods to provide an instance of your store implementation. See e.g. CQLStoreTest
2) If you plan to opensource your work anyway, it may help if you make your work in progress public - e.g. put it on Github or Gitlab etc right now. This way you can refer to specific code, and may even get someone to help out. Of course it is not always possible depending on your company's policies, but if it is - it may be beneficial.
toggle quoted message
Show quoted text
On Tue, 3 Dec 2019 at 00:26, Dmitry Kovalev < dk.g...@...> wrote: Hi Debashish,
in terms of wrapping one's head around what getSlice() method does - conceptually it is not hard to understand, if you peruse the link I have referred you to in my original reply:
The relevant part of it is really short so I'll just copy it here (with added emphasis in bold):
===quote=== Bigtable Data Model
Under the Bigtable data model each table is a collection of rows. Each row is uniquely identified by a key. Each row is comprised of an arbitrary (large, but limited) number of cells. A cell is composed of a column and value. A cell is uniquely identified by a column within a given row. Rows in the Bigtable model are called "wide rows" because they support a large number of cells and the columns of those cells don’t have to be defined up front as is required in relational databases. JanusGraph has an additional requirement for the Bigtable data model: The cells must be sorted by their columns and a subset of the cells specified by a column range must be efficiently retrievable (e.g. by using index structures, skip lists, or binary search). ===/quote=== Basically, getSlice method is the formal representation of above requirement in bold: based on the order defined for "column keys" space, it should return all "columns" whose keys lay "between" a start and end key values, given in SliceQuery... that is, >= start and <=end... Please refer to the javadoc for more detail. However, answering the question of how do you effectively implement it in your backend is pretty much the crux of your potential contribution. If the underlying DB's data model more or less "natively" supports the above (as e.g. in the case of Cassandra, BDB etc), then it becomes relatively easy. If the underlying data model is different, then it gets us back to the question which has been asked a couple of times in this thread - i.e. whether it is actually feasible and/or desirable to try and implement it? For example, in order to implement it in a "classical" RDBMS, your would have to find one which supports ordering and indexing of byte columns/blobs, and then probably encounter scalability issues if you chose to model the whole key-column-value store as one table with row key, column key and data... It might still be possible to address these issues and implement it reasonably effectively, but it is unclear what would be the point - as you would effectively have to circumvent the "relational/SQL" top abstraction layer, which is the whole point of RDBMS, to get back to lower level implementation details. Unfortunately I know nothing about Snowflake and it's data model, and don't have the time to learn about it in any sufficient detail any time soon, so I cannot really advise you neither on feasibility nor on any implementation details.
Hope this helps,
Dmitry
On Sun, 1 Dec 2019 at 09:04, Debasish Kanhar < d.k...@...> wrote: Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses prepared statements, and hence that would be appropriate for me in sense that, I'll be using SQL statements (SnowSQL) for SnowFlake querying using a DAO. Thus CQL and SnowFlake adopter I'm building would be similar and hence makes sense to reference out of those.
As mentioned before, I'm currently blocked at the method getSlice. I know that the method is used while querying the data, but I'm unable to get my head around how does it work internally. A blind implementation might work, but it won't give me an understanding how its working internally. If anyone can help me understand how it works, a similar implementation for SnowFlake becomes easier then.
As mentioned before I'm basing my understanding from CQL adopter. If we look at CQLKeyColumnValueStore under getSlice method, it makes use of this.getSlice prepared statement to fulfill query. The this.getSlice is as follows:
this.getSlice = this.session.prepare(select() .column(COLUMN_COLUMN_NAME) .column(VALUE_COLUMN_NAME) .fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME) .fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME) .from(this.storeManager.getKeyspaceName(), this.tableName) .where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))) .and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING))) .and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING))) .limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in the method public EntryList getSlice() which uses the prepared statement above to execute some query. When the following happens (Contents of getSlice method)
final Future<EntryList> result = Future.fromJavaFuture( this.executorService, this.session.executeAsync(this.getSlice.bind() .setBytes(KEY_BINDING, query.getKey().asByteBuffer()) .setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer()) .setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer()) .setInt(LIMIT_BINDING, query.getLimit()) .setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel()))) .map(resultSet -> fromResultSet(resultSet, this.getter)); interruptibleWait(result);
Is following understanding correct? Anyone with JanusGraph and Cassandra expertise can help.
I'm updating the base query from following bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer())) .and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer())) .and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer())) .limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS (SnowFlake for eg though SnowFlake isn't RDBMS, it is similar in terms of storage and query engine) with 3 columns as (key, value, column1) of datatypes string (varchar with binary info) can something similar query be correct?
SELECT .... FROM keyspace WHERE key = query.getKey().asByteBuffer() and column1 >= query.getSliceStart().asByteBuffer() and column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms of what is targeted to achieve? If I can understand the actual meaning of the prepared statements here, I can also base my undertandings for rest of methods which would be required for doing mutations in underlaying backend.
Any help is really appreciated as we are kinda getting tighter and tighter on deadline regarding the feasibility PoC of SnowFlake as backend for JanusGraph.
Thanks in advance
On Thursday, 28 November 2019 21:05:09 UTC+5:30, Debasish Kanhar wrote: Hi Evgeniy,
Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
|
|
Debasish Kanhar <d.k...@...>
Thanks Dimitry for the detailed explanation.
Few of my counter questions:
I had gone through the the Big Data model you mentioned and also the architecture diagram of underlaying Titan Blueprints Graph (Some 6 year old repo). I was able to deduce the internal structure how Janus stores the intrinsic Graph elements. It stores as Adjacency list as shown in image you shared where the key is "Row Key for Vertex. Maybe sereliazed version of its vertex ID or unique identifier? ". The row of vertex has 2 components within it except for RowKey. i.e.
1: All unique properites/relations. It in turn contain column within it storing meta information for the relation. This is also sorted by "ColumnKey" which is probably combination of RelationName and some ID.
2: All non unique relations. It is as super column under which we have multiple columns, sorted by "ColumnKey" and retrivable according to that as well.
And, I think these above 2 types of relations are the ones which are queries / retrived from Janus using getSlice method. Is that correct?
But then when I tried to model this as is, I though we would have columns corresponding to each relation/relation super column.But to verify my understanding I did query on internal table structure in Cassandra to understand the structurr, and as mentioned I just saw 3 columns not "N" columns corresponding to each relation I was expecting.
If we are to map those column to the data structure defined above, how do they map?
Key is same as RowKey value is same as collection of unique relations? column1 is collection of all super columns?
If the above is correct, it helps a lot in understanding getSlice method.
But then it brings me to next question: Whenever a row is inserted, it means that either a new vertex is added, or existing vertex is mutated in some way or other. So, based on above understanding, "value" as "serialized format" remains the same. So does its "column1". As the represent some static information regarding a vertex. i.e. its relations and properties. So, whenever you do, "getSlice between sliceStart and sliceEnd", the results won't change unless sliceStart and sliceEnd conditions change. Is that understanding correct as well?
So, is this understanding correct as well: Fetch a row first. Then fetch its subset of properties if and only if they fall under a range of sliceStart and sliceEnd?
Also, you don't need to think this from SnowFlake perspective but think of this from RDBMS perspective. Anything possible in RDBMS is possble in SnowFlake in additon to extra features. So if its possible in RDBMS its logically possible in SnowFlake as well.
Really thanks again for your suggestions. If you can clear a few counter doubts, w.r.t. any RDBMS, that will be great.
But looks like only way to check feasibility study would be to implement Unit tests (My implementation of CQLStoreTest). Is that correct?
toggle quoted message
Show quoted text
On Tuesday, 3 December 2019 05:56:58 UTC+5:30, Dmitry Kovalev wrote: Hi Debashish,
in terms of wrapping one's head around what getSlice() method does - conceptually it is not hard to understand, if you peruse the link I have referred you to in my original reply:
The relevant part of it is really short so I'll just copy it here (with added emphasis in bold):
===quote=== Bigtable Data Model
Under the Bigtable data model each table is a collection of rows. Each row is uniquely identified by a key. Each row is comprised of an arbitrary (large, but limited) number of cells. A cell is composed of a column and value. A cell is uniquely identified by a column within a given row. Rows in the Bigtable model are called "wide rows" because they support a large number of cells and the columns of those cells don’t have to be defined up front as is required in relational databases. JanusGraph has an additional requirement for the Bigtable data model: The cells must be sorted by their columns and a subset of the cells specified by a column range must be efficiently retrievable (e.g. by using index structures, skip lists, or binary search). ===/quote=== Basically, getSlice method is the formal representation of above requirement in bold: based on the order defined for "column keys" space, it should return all "columns" whose keys lay "between" a start and end key values, given in SliceQuery... that is, >= start and <=end... Please refer to the javadoc for more detail. However, answering the question of how do you effectively implement it in your backend is pretty much the crux of your potential contribution. If the underlying DB's data model more or less "natively" supports the above (as e.g. in the case of Cassandra, BDB etc), then it becomes relatively easy. If the underlying data model is different, then it gets us back to the question which has been asked a couple of times in this thread - i.e. whether it is actually feasible and/or desirable to try and implement it? For example, in order to implement it in a "classical" RDBMS, your would have to find one which supports ordering and indexing of byte columns/blobs, and then probably encounter scalability issues if you chose to model the whole key-column-value store as one table with row key, column key and data... It might still be possible to address these issues and implement it reasonably effectively, but it is unclear what would be the point - as you would effectively have to circumvent the "relational/SQL" top abstraction layer, which is the whole point of RDBMS, to get back to lower level implementation details. Unfortunately I know nothing about Snowflake and it's data model, and don't have the time to learn about it in any sufficient detail any time soon, so I cannot really advise you neither on feasibility nor on any implementation details.
Hope this helps,
Dmitry
On Sun, 1 Dec 2019 at 09:04, Debasish Kanhar < d...@...> wrote: Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses prepared statements, and hence that would be appropriate for me in sense that, I'll be using SQL statements (SnowSQL) for SnowFlake querying using a DAO. Thus CQL and SnowFlake adopter I'm building would be similar and hence makes sense to reference out of those.
As mentioned before, I'm currently blocked at the method getSlice. I know that the method is used while querying the data, but I'm unable to get my head around how does it work internally. A blind implementation might work, but it won't give me an understanding how its working internally. If anyone can help me understand how it works, a similar implementation for SnowFlake becomes easier then.
As mentioned before I'm basing my understanding from CQL adopter. If we look at CQLKeyColumnValueStore under getSlice method, it makes use of this.getSlice prepared statement to fulfill query. The this.getSlice is as follows:
this.getSlice = this.session.prepare(select() .column(COLUMN_COLUMN_NAME) .column(VALUE_COLUMN_NAME) .fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME) .fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME) .from(this.storeManager.getKeyspaceName(), this.tableName) .where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))) .and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING))) .and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING))) .limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in the method public EntryList getSlice() which uses the prepared statement above to execute some query. When the following happens (Contents of getSlice method)
final Future<EntryList> result = Future.fromJavaFuture( this.executorService, this.session.executeAsync(this.getSlice.bind() .setBytes(KEY_BINDING, query.getKey().asByteBuffer()) .setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer()) .setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer()) .setInt(LIMIT_BINDING, query.getLimit()) .setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel()))) .map(resultSet -> fromResultSet(resultSet, this.getter)); interruptibleWait(result);
Is following understanding correct? Anyone with JanusGraph and Cassandra expertise can help.
I'm updating the base query from following bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer())) .and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer())) .and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer())) .limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS (SnowFlake for eg though SnowFlake isn't RDBMS, it is similar in terms of storage and query engine) with 3 columns as (key, value, column1) of datatypes string (varchar with binary info) can something similar query be correct?
SELECT .... FROM keyspace WHERE key = query.getKey().asByteBuffer() and column1 >= query.getSliceStart().asByteBuffer() and column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms of what is targeted to achieve? If I can understand the actual meaning of the prepared statements here, I can also base my undertandings for rest of methods which would be required for doing mutations in underlaying backend.
Any help is really appreciated as we are kinda getting tighter and tighter on deadline regarding the feasibility PoC of SnowFlake as backend for JanusGraph.
Thanks in advance
On Thursday, 28 November 2019 21:05:09 UTC+5:30, Debasish Kanhar wrote: Hi Evgeniy,
Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
|
|
Debasish Kanhar <d.k...@...>
To anyone if they are following this Thread. Wanted to post an update if anyone is interested, else will close out the Thread.
So, We were able to get the adopter working for SnowFlake. We planned on open sourcing it, and for now its hosted on Gitlab (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) if anyone is interested to take a look.
The storage adopted we created tries to model SnowFlake as KeyValueStore, it can be modelled as KeyColumnValueStore as well, but its all about deciding about underlaying data structure and doing required transformations.
But, as suspected, we see dredful drop in performance. Since JanusGraph issues various multi part queries, that kind of slows down the process overall. To be honest, we aren't able to do any sort of WRITE operation in practical enough time, and are only able to do READ operation with slower response (almost on border line of what can be thought to be acceptable). To tackle the issue of WRITE opearation, we are implementing a custom write pipeline where we fetch data from input tables, transform it and load to local BerkleyDB store. We then iterate over BerkleyDB local files/tables and load those in SnowFlake. Our gremlin server can then pick up from the set of SnowFlake tables to serve any sort of Gremlin queries for READ operation.
toggle quoted message
Show quoted text
On Wednesday, 4 December 2019 10:36:08 UTC+5:30, Debasish Kanhar wrote: Thanks Dimitry for the detailed explanation.
Few of my counter questions:
I had gone through the the Big Data model you mentioned and also the architecture diagram of underlaying Titan Blueprints Graph (Some 6 year old repo). I was able to deduce the internal structure how Janus stores the intrinsic Graph elements. It stores as Adjacency list as shown in image you shared where the key is "Row Key for Vertex. Maybe sereliazed version of its vertex ID or unique identifier? ". The row of vertex has 2 components within it except for RowKey. i.e.
1: All unique properites/relations. It in turn contain column within it storing meta information for the relation. This is also sorted by "ColumnKey" which is probably combination of RelationName and some ID.
2: All non unique relations. It is as super column under which we have multiple columns, sorted by "ColumnKey" and retrivable according to that as well.
And, I think these above 2 types of relations are the ones which are queries / retrived from Janus using getSlice method. Is that correct?
But then when I tried to model this as is, I though we would have columns corresponding to each relation/relation super column.But to verify my understanding I did query on internal table structure in Cassandra to understand the structurr, and as mentioned I just saw 3 columns not "N" columns corresponding to each relation I was expecting.
If we are to map those column to the data structure defined above, how do they map?
Key is same as RowKey value is same as collection of unique relations? column1 is collection of all super columns?
If the above is correct, it helps a lot in understanding getSlice method.
But then it brings me to next question: Whenever a row is inserted, it means that either a new vertex is added, or existing vertex is mutated in some way or other. So, based on above understanding, "value" as "serialized format" remains the same. So does its "column1". As the represent some static information regarding a vertex. i.e. its relations and properties. So, whenever you do, "getSlice between sliceStart and sliceEnd", the results won't change unless sliceStart and sliceEnd conditions change. Is that understanding correct as well?
So, is this understanding correct as well: Fetch a row first. Then fetch its subset of properties if and only if they fall under a range of sliceStart and sliceEnd?
Also, you don't need to think this from SnowFlake perspective but think of this from RDBMS perspective. Anything possible in RDBMS is possble in SnowFlake in additon to extra features. So if its possible in RDBMS its logically possible in SnowFlake as well.
Really thanks again for your suggestions. If you can clear a few counter doubts, w.r.t. any RDBMS, that will be great.
But looks like only way to check feasibility study would be to implement Unit tests (My implementation of CQLStoreTest). Is that correct?
On Tuesday, 3 December 2019 05:56:58 UTC+5:30, Dmitry Kovalev wrote: Hi Debashish,
in terms of wrapping one's head around what getSlice() method does - conceptually it is not hard to understand, if you peruse the link I have referred you to in my original reply:
The relevant part of it is really short so I'll just copy it here (with added emphasis in bold):
===quote=== Bigtable Data Model
Under the Bigtable data model each table is a collection of rows. Each row is uniquely identified by a key. Each row is comprised of an arbitrary (large, but limited) number of cells. A cell is composed of a column and value. A cell is uniquely identified by a column within a given row. Rows in the Bigtable model are called "wide rows" because they support a large number of cells and the columns of those cells don’t have to be defined up front as is required in relational databases. JanusGraph has an additional requirement for the Bigtable data model: The cells must be sorted by their columns and a subset of the cells specified by a column range must be efficiently retrievable (e.g. by using index structures, skip lists, or binary search). ===/quote=== Basically, getSlice method is the formal representation of above requirement in bold: based on the order defined for "column keys" space, it should return all "columns" whose keys lay "between" a start and end key values, given in SliceQuery... that is, >= start and <=end... Please refer to the javadoc for more detail. However, answering the question of how do you effectively implement it in your backend is pretty much the crux of your potential contribution. If the underlying DB's data model more or less "natively" supports the above (as e.g. in the case of Cassandra, BDB etc), then it becomes relatively easy. If the underlying data model is different, then it gets us back to the question which has been asked a couple of times in this thread - i.e. whether it is actually feasible and/or desirable to try and implement it? For example, in order to implement it in a "classical" RDBMS, your would have to find one which supports ordering and indexing of byte columns/blobs, and then probably encounter scalability issues if you chose to model the whole key-column-value store as one table with row key, column key and data... It might still be possible to address these issues and implement it reasonably effectively, but it is unclear what would be the point - as you would effectively have to circumvent the "relational/SQL" top abstraction layer, which is the whole point of RDBMS, to get back to lower level implementation details. Unfortunately I know nothing about Snowflake and it's data model, and don't have the time to learn about it in any sufficient detail any time soon, so I cannot really advise you neither on feasibility nor on any implementation details.
Hope this helps,
Dmitry
On Sun, 1 Dec 2019 at 09:04, Debasish Kanhar < d...@...> wrote: Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses prepared statements, and hence that would be appropriate for me in sense that, I'll be using SQL statements (SnowSQL) for SnowFlake querying using a DAO. Thus CQL and SnowFlake adopter I'm building would be similar and hence makes sense to reference out of those.
As mentioned before, I'm currently blocked at the method getSlice. I know that the method is used while querying the data, but I'm unable to get my head around how does it work internally. A blind implementation might work, but it won't give me an understanding how its working internally. If anyone can help me understand how it works, a similar implementation for SnowFlake becomes easier then.
As mentioned before I'm basing my understanding from CQL adopter. If we look at CQLKeyColumnValueStore under getSlice method, it makes use of this.getSlice prepared statement to fulfill query. The this.getSlice is as follows:
this.getSlice = this.session.prepare(select() .column(COLUMN_COLUMN_NAME) .column(VALUE_COLUMN_NAME) .fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME) .fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME) .from(this.storeManager.getKeyspaceName(), this.tableName) .where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))) .and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING))) .and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING))) .limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in the method public EntryList getSlice() which uses the prepared statement above to execute some query. When the following happens (Contents of getSlice method)
final Future<EntryList> result = Future.fromJavaFuture( this.executorService, this.session.executeAsync(this.getSlice.bind() .setBytes(KEY_BINDING, query.getKey().asByteBuffer()) .setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer()) .setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer()) .setInt(LIMIT_BINDING, query.getLimit()) .setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel()))) .map(resultSet -> fromResultSet(resultSet, this.getter)); interruptibleWait(result);
Is following understanding correct? Anyone with JanusGraph and Cassandra expertise can help.
I'm updating the base query from following bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer())) .and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer())) .and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer())) .limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS (SnowFlake for eg though SnowFlake isn't RDBMS, it is similar in terms of storage and query engine) with 3 columns as (key, value, column1) of datatypes string (varchar with binary info) can something similar query be correct?
SELECT .... FROM keyspace WHERE key = query.getKey().asByteBuffer() and column1 >= query.getSliceStart().asByteBuffer() and column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms of what is targeted to achieve? If I can understand the actual meaning of the prepared statements here, I can also base my undertandings for rest of methods which would be required for doing mutations in underlaying backend.
Any help is really appreciated as we are kinda getting tighter and tighter on deadline regarding the feasibility PoC of SnowFlake as backend for JanusGraph.
Thanks in advance
On Thursday, 28 November 2019 21:05:09 UTC+5:30, Debasish Kanhar wrote: Hi Evgeniy,
Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question you asked. The current
implementation of backends like Scylla as you mentioned are
really highly performant. There is no specific problem in
mind, but off late I have been dealing with a lot of clients
who are migrating their whole system into SnowFlake, including
the whole Data storage and Analytics components as well.
SnowFlake is a hot upcoming Data storage and warehousing
system.
Those clients are really reluctant to add another storage
component to their application. Reasons can be a lot like due
to high costs, or added complexity of their architecture, or
duplication of data across storages. But at the same time
these clients also want to incorporate Graph Databases and
Graph Analytics into their application as well. This
integration is targeted for those set of customers/clients who
are/have migrating/migrated into SnowFlake and want to have
Graph based component as well. For now, its simply not
possible for them to have JanusGraph with their SnowFlake data
storages.
Hope I was able to explain it clearly :-)
On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer
wrote:
Debasish,
This sounds like an interesting project, but I do
have a question about your choice of Snowflake. If I
missed your response to this in the email chain, I
apologize, but what problems with the existing
high-performance backends (Scylla, for instance) are you
trying to solve with Snowflake? The answer to that
would probably inform your specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27, 2019 at 3:18 AM Debasish
Kanhar < d...@...> wrote:
Hi Dimitriy,
Sorry about the late response. I was working on
this project part time only till last week when we
moved into full time dev for this PoC. Really thanks
to your pointers and Jason's that we have been able
to start with the development works and we have some
ground work to start with :-)
So,we are modelling SnowFlake (Which is like SQL
File store) as a Key-Value store by creating two
columns namely "Key" and "Value" in each tables. We
are going to define the data type as binary here (Or
Stringified Binary) so that arbitrary data can be
dumped (I feel its of type StaticBuffer Key and
StaticBuffer value. Is that correct? )
Since, we are modelling SnowFlake as Key-Value
store, it makes sense to have a SnowFlakeManager
class implement OrderedKeyValueStore like
for BerkleyJE? Is that correct understanding?
Updates are that we have almost finished
development of SnowFlakeManager class. The required
methods needed are implemented like beginTransaction,
openDatabase though one particular function
not done is mutateMany is not done, but it will be
done as it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic functions in KeyValueStore
is also done like insert (Insert binary key-value),
get (Get from binary key), delete (Delete a row
using binary key). We are kinda stuck at the
function getSlice(). What does it do?
Now, my question here is that, slice query is
used while queryingfor properties for vertices
(edges/properties) by slicing the relations of
vertex and slicing them based on filters/conditions.
The following steps are followed in getSlice
function (BerkleyKeyValueStore - berkleydb &
ColumnValueStore - inmemory) :
- Find the row from the passed key. (Returns a
Binary value against the binary key)
- Fetch slice bounderies, i.e. slice start and
end from query passed
- Apply the slice boundries on the returned
value in 1st step else, fetch the first results
(pt 1) by applying the slicing conditions in
step
My question is related to last step. Since my
data in DB is just "Binary Key-Binary Value", how
can we apply another constraints (slice
conditions) in query? It just doesn't have any
additional meta data to apply slice on as I just
have 2 columns in my table.
Hope my explaination was clear for you to
understand. I want to know primarily how the last
step would work in the data model I described
above (Having 2 columns, one for Key and other for
Value. And each of stringified binary data type).
And, is the data model selected good enough?
Thanks in advance. And I promise this time my
replies will be quicker :-)
On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
here are my 2 cents:
First of all, you need to be clear with
yourself as to why exactly you want to build a
new backend? E.g. do you find that the
existing ones are sub-optimal for certain use
cases, or they are too hard to set up, or you
just want to provide a backend to a cool new
database in the hope that it will increase
adoption, or smth else? In other words, do you
have a clear idea of what is this new backend
going to provide which the existing ones do
not, e.g. advanced scalability or performance
or ease of setup, or just an option for people
with existing Snowflake infra to put it to a
new use?
Second, you are almost correct, in that
basically all you need to implement are three
interfaces:
- KeyColumnValueStoreManager, which allows
opening multiple instances of named
KeyColumnValueStores and provides a certain
level of transactional context between
different stores it has opened
-
KeyColumnValueStore - which represents an
ordered collection of "rows" accessible by
keys, where each row is a
- KeyValueStore - basically an ordered
collection of key-value pairs, which can be
though of as individual "columns" of that row,
and their respective values
Both row and column keys, and the data
values are generic byte data.
Possibly the simplest way to understand the
"minimum contract" required by Janusgraph from
a backend is to look at the inmemory backend.
You will see that:
- KeyColumnValueStoreManager is
conceptually a Map of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is conceptually a
NavigableMap of "rows" or KeyValueStores (i.e.
a "table")
,
- each KeyValueStore is conceptually an
ordered collection of key -> value pairs
("columns").
In the most basic case, once you implement
these three relatively simple interfaces,
Janusgraph can take care of all the
translation of graph operations such as adding
vertices and edges, and of gremlin queries,
into a series of read-write operations over a
collection of KCV stores. When you open a new
graph, JanusGraph asks the
KeyColumnValueStoreManager implementation to
create a number of specially named
KeyColumnValueStores, which it uses to store
vertices, edges, and various indices. It
creates a number of "utility" stores which it
uses internally for locking, id management
etc.
Crucially, whatever stores Janusgraph
creates in your backend implementation, and
whatever it is using them for, you only need
to make sure that you implement those basic
interfaces which allow to store arbitrary byte
data and access it by arbitrary byte keys.
So for your first "naive" implementation,
you most probably shouldn't worry too much
about translation of graph model to KCVS model
and back - this is what Janusgraph itself is
mostly about anyway. Just use StoreFeatures to
tell Janusgraph that your backend supports
only most basic operations, and concentrate on
thinking how to best implement the KCVS
interfaces with your underlying
database/storage system.
Of course, after that, as you start
thinking of supporting better levels of
consistency/transaction management across
multiple stores, about performance, better
utilising native indexing/query mechanisms,
separate indexing backends, support for
distributed backend model etc etc - you will
find that there is more to it, and this is
where you can gain further insights from the
documentation, existing backend sources and
asking more specific questions.
Hope this helps,
Dmitry
On Thu, 24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that JanusGraph needs a column-family type
nosql database as storage backend, and
hence that is why we have Scylla,
Cassandra, HBase etc. SnowFlake isn't a
column family database, but it has a
column data type which can store any
sort of data. So we can store complete JSON Oriented Column family data here
after massaging / pre-processing the
data. Is that a practical thought? Is is
practical enough to implement?
If
it is practical enough to implement,
what needs to be done? I'm going through
the source code, and I'm basing my ideas
based on my understanding from janusgraph-cassandra and janusgraph-berkley projects.
Please correct me if I'm wrong in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These class needs to have
build features object
which is more or less like storage
connection configuration. They need
to have a beginTransaction method
which creates the actual connection
to corresponding storage backend. Is
that correct?
- You
will need to have corresponding Transaction classes
which create the transaction to
corresponding backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though I can see and
understand the transaction being
created in BerkeleyJETx I
don't see something similar for CassandraTransaction .
So am I missing something in my
undesrtanding here?
- You
need to have
KeyColumnValueStore class
for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to extend KeyColumnValueStore .
This class takes care of massaging
the data into KeyColumnFormat so
that they can then be inserted into
corresponding table inside Storage
Backend.
- So
question to my mind are, what will
be structure of those classes?
- Are
there some
methods which
needs to be present always like I
see getSlice() being
used across in all classes. Also,
how do they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any other classes I'm
missing out on or these 3 are the
only ones needed to be modified to
create a new storage backend?
- Also,
if these 3 are only classes
needed, and let's say we success
in using SnowFlake as storage
backend, how do the
read aspect of janusgraph/query aspect gets
solved? Are there any changes
needed as well on that end or
JanusGraph is so abstracted that
it can now start picking up from
new source?
- And,
I thought there would be some
classes which would be reading in
from "gremlin queries" doing certain
"pre-processing into certain data
structures (tabular)" and then
pushed it through some connection
into respective backends. This is
where we cant help, is there a way
to visualize those objects after
"pre-processing" and then store
those objects as it is in SnowFlake
and reuse it to fulfill gremlin
queries.
I
know we can store random objects in
SnowFlake, just looking at changed
needed at JanusGraph level to achieve
those.
Any
help will be really appreciated.
Thanks
in Advance.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
|
|
Evgeniy Ignatiev <yevgeniy...@...>
Hello.
Awesome job! I have a couple of questions about your data loading
approach if you don't mind.
Is it simply aggregating writes locally before writing them to
Snowflake? Or do you also use BerkeleyDB as a local write-through
cache, from where reads are served for data is not yet in
Snowflake?
Drop in performance sounds expectable in comparison to Cassandra,
it is not simply RDBMS vs NoSQL, but DWH vs NoSQL, Snowflake is
really not optimized to perform multiple small operations, single
insert is almost of the same latency as bulk insert, ideally
significantly large bulk insert for JDBC driver to leverage
internal stage loading optimization, as I understand you are going
to do it manually through PUT FILE + COPY INTO combination.
Updates are significantly slower and single updates are really
devastating to performance (order of magnitude degradation with
hundreds of concurrent threads) due to locking behavior and write
amplification that Snowflake micro-partitioning should perform
(overwriting whole micro-partition and/or creating single record
file which will result in single object stored in underlying
storage like S3).
Also bulk reading by means of SQL might not be worth it too, e.g.
if you want to use SparkGraphComputer - Snowflake Spark connector
itself, issues direct SQL queries only to request metadata, even
for native SQL backed DataFrames/DataSets. Actual reading happens
in parallel from executors by offloading data to S3 stage and
reading directly from it.
Best regards,
Evgeniy Ignatiev.
On 2/4/2020 12:40 AM, Debasish Kanhar
wrote:
To anyone if they are following this Thread. Wanted to post
an update if anyone is interested, else will close out the
Thread.
The storage adopted we created tries to model SnowFlake as
KeyValueStore, it can be modelled as KeyColumnValueStore as
well, but its all about deciding about underlaying data
structure and doing required transformations.
But, as suspected, we see dredful drop in
performance. Since JanusGraph issues various multi part
queries, that kind of slows down the process overall. To be
honest, we aren't able to do any sort of WRITE operation in
practical enough time, and are only able to do READ operation
with slower response (almost on border line of what can be
thought to be acceptable). To tackle the issue of WRITE
opearation, we are implementing a custom write pipeline where
we fetch data from input tables, transform it and load to
local BerkleyDB store. We then iterate over BerkleyDB local
files/tables and load those in SnowFlake. Our gremlin server
can then pick up from the set of SnowFlake tables to serve any
sort of Gremlin queries for READ operation.
On Wednesday, 4 December 2019 10:36:08 UTC+5:30, Debasish Kanhar
wrote:
Thanks Dimitry for the detailed explanation.
Few of my counter questions:
I had gone through the the Big Data model you mentioned
and also the architecture diagram of underlaying Titan
Blueprints Graph (Some 6 year old repo). I was able to
deduce the internal structure how Janus stores the
intrinsic Graph elements. It stores as Adjacency list as
shown in image you shared where the key is "Row Key for
Vertex. Maybe sereliazed version of its vertex ID or
unique identifier? ". The row of vertex has 2 components
within it except for RowKey. i.e.
1: All unique properites/relations. It in turn contain
column within it storing meta information for the
relation. This is also sorted by "ColumnKey" which is
probably combination of RelationName and some ID.
2: All non unique relations. It is as super column
under which we have multiple columns, sorted by
"ColumnKey" and retrivable according to that as well.
And, I think these above 2 types of relations are the
ones which are queries / retrived from Janus using
getSlice method. Is that correct?
But then when I tried to model this as is, I though we
would have columns corresponding to each relation/relation
super column.But to verify my understanding I did query on
internal table structure in Cassandra to understand the
structurr, and as mentioned I just saw 3 columns not "N"
columns corresponding to each relation I was expecting.
If we are to map those column to the data structure
defined above, how do they map?
Key is same as RowKey
value is same as collection of unique relations?
column1 is collection of all super columns?
If the above is correct, it helps a lot in
understanding getSlice method.
But then it brings me to next question:
Whenever a row is inserted, it means that either a new
vertex is added, or existing vertex is mutated in some way
or other. So, based on above understanding, "value" as
"serialized format" remains the same. So does its
"column1". As the represent some static information
regarding a vertex. i.e. its relations and properties. So,
whenever you do, "getSlice between sliceStart and
sliceEnd", the results won't change unless sliceStart and
sliceEnd conditions change. Is that understanding correct
as well?
So, is this understanding correct as well: Fetch a row
first. Then fetch its subset of properties if and only if
they fall under a range of sliceStart and sliceEnd?
Also, you don't need to think this from SnowFlake
perspective but think of this from RDBMS perspective.
Anything possible in RDBMS is possble in SnowFlake in
additon to extra features. So if its possible in RDBMS its
logically possible in SnowFlake as well.
Really thanks again for your suggestions. If you can
clear a few counter doubts, w.r.t. any RDBMS, that will be
great.
But looks like only way to check feasibility study
would be to implement Unit tests (My implementation of
CQLStoreTest). Is that correct?
On Tuesday, 3 December 2019 05:56:58 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
in terms of wrapping one's head around what
getSlice() method does - conceptually it is not hard
to understand, if you peruse the link I have referred
you to in my original reply:
The relevant part of it is really short so I'll
just copy it here (with added emphasis in bold):
===quote===
Bigtable Data
Model

Under the Bigtable data model each
table is a collection of rows. Each row is uniquely
identified by a key. Each row is comprised of an
arbitrary (large, but limited) number of cells. A
cell is composed of a column and value. A cell is
uniquely identified by a column within a given row.
Rows in the Bigtable model are called "wide rows"
because they support a large number of cells and the
columns of those cells don’t have to be defined up
front as is required in relational databases.
JanusGraph
has an additional requirement for the Bigtable
data model: The cells must be sorted by their
columns and a subset of the cells specified by a
column range must be efficiently retrievable (e.g.
by using index structures, skip lists, or binary
search).
===/quote===
Basically, getSlice method is the formal
representation of above requirement in bold: based on
the order defined for "column keys" space, it should
return all "columns" whose keys lay "between" a start
and end key values, given in SliceQuery... that is,
>= start and <=end... Please refer to the
javadoc for more detail.
However, answering the question of how do you
effectively implement it in your backend is pretty
much the crux of your potential contribution.
If the underlying DB's data model more or less
"natively" supports the above (as e.g. in the case of
Cassandra, BDB etc), then it becomes relatively easy.
If the underlying data model is different, then it
gets us back to the question which has been asked a
couple of times in this thread - i.e. whether it is
actually feasible and/or desirable to try and
implement it?
For example, in order to implement it in a "classical"
RDBMS, your would have to find one which supports
ordering and indexing of byte columns/blobs, and then
probably encounter scalability issues if you chose to
model the whole key-column-value store as one table
with row key, column key and data... It might still be
possible to address these issues and implement it
reasonably effectively, but it is unclear what would
be the point - as you would effectively have to
circumvent the "relational/SQL" top abstraction layer,
which is the whole point of RDBMS, to get back to
lower level implementation details.
Unfortunately I know nothing about Snowflake and it's
data model, and don't have the time to learn about it
in any sufficient detail any time soon, so I cannot
really advise you neither on feasibility nor on any
implementation details.
Hope this helps,
Dmitry
On Sun, 1 Dec 2019 at 09:04, Debasish
Kanhar < d...@...>
wrote:
Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses
prepared statements, and hence that would be
appropriate for me in sense that, I'll be using
SQL statements (SnowSQL) for SnowFlake querying
using a DAO. Thus CQL and SnowFlake adopter I'm
building would be similar and hence makes sense to
reference out of those.
As mentioned before, I'm currently blocked at
the method getSlice. I know that
the method is used while querying the data, but
I'm unable to get my head around how does it work
internally. A blind implementation might work, but
it won't give me an understanding how its working
internally. If anyone can help me understand how
it works, a similar implementation for SnowFlake
becomes easier then.
As mentioned before I'm basing my understanding
from CQL adopter. If we look at
CQLKeyColumnValueStore under getSlice
method, it makes use of this.getSlice
prepared statement to fulfill query. The
this.getSlice is as follows:
this.getSlice = this.session.prepare(select()
.column(COLUMN_COLUMN_NAME)
.column(VALUE_COLUMN_NAME)
.fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME)
.fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME)
.from(this.storeManager.getKeyspaceName(), this.tableName)
.where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING)))
.and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING)))
.and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING)))
.limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in
the method public EntryList
getSlice() which uses the prepared
statement above to execute some query. When
the following happens (Contents of getSlice
method)
final Future<EntryList> result = Future.fromJavaFuture(
this.executorService,
this.session.executeAsync(this.getSlice.bind()
.setBytes(KEY_BINDING, query.getKey().asByteBuffer())
.setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer())
.setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer())
.setInt(LIMIT_BINDING, query.getLimit())
.setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel())))
.map(resultSet -> fromResultSet(resultSet, this.getter));
interruptibleWait(result);
Is following understanding correct? Anyone with
JanusGraph and Cassandra expertise can help.
I'm updating the base query from following
bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer()))
.and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer()))
.and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer()))
.limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS
(SnowFlake for eg though SnowFlake isn't RDBMS, it
is similar in terms of storage and query engine)
with 3 columns as (key, value, column1) of
datatypes string (varchar with binary info) can
something similar query be correct?
SELECT .... FROM keyspace WHERE
key = query.getKey().asByteBuffer()
and
column1 >= query.getSliceStart().asByteBuffer()
and
column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms
of what is targeted to achieve? If I can
understand the actual meaning of the prepared
statements here, I can also base my undertandings
for rest of methods which would be required for
doing mutations in underlaying backend.
Any help is really appreciated as we are kinda
getting tighter and tighter on deadline regarding
the feasibility PoC of SnowFlake as backend for
JanusGraph.
Thanks in advance
On Thursday, 28 November 2019 21:05:09
UTC+5:30, Debasish Kanhar wrote:
Hi Evgeniy,
Thanks for the question. We plan to open
source it once implemented but we are still
long way from implementation. Will be really
grateful to community who can help in any way
to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30,
Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be
open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question
you asked. The current implementation
of backends like Scylla as you
mentioned are really highly
performant. There is no specific
problem in mind, but off late I have
been dealing with a lot of clients who
are migrating their whole system into
SnowFlake, including the whole Data
storage and Analytics components as
well. SnowFlake is a hot upcoming Data
storage and warehousing system.
Those clients are really reluctant
to add another storage component to
their application. Reasons can be a
lot like due to high costs, or added
complexity of their architecture, or
duplication of data across storages.
But at the same time these clients
also want to incorporate Graph
Databases and Graph Analytics into
their application as well. This
integration is targeted for those set
of customers/clients who are/have
migrating/migrated into SnowFlake and
want to have Graph based component as
well. For now, its simply not possible
for them to have JanusGraph with their
SnowFlake data storages.
Hope I was able to explain it
clearly :-)
On Wednesday, 27 November 2019 20:40:52
UTC+5:30, Ryan Stauffer wrote:
Debasish,
This sounds like an
interesting project, but I do
have a question about your
choice of Snowflake. If I
missed your response to this in
the email chain, I apologize,
but what problems with the
existing high-performance
backends (Scylla, for instance)
are you trying to solve with
Snowflake? The answer to that
would probably inform your
specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27,
2019 at 3:18 AM Debasish Kanhar
< d...@...>
wrote:
Hi Dimitriy,
Sorry about the late
response. I was working on
this project part time only
till last week when we moved
into full time dev for this
PoC. Really thanks to your
pointers and Jason's that we
have been able to start with
the development works and we
have some ground work to
start with :-)
So,we are modelling
SnowFlake (Which is like SQL
File store) as a Key-Value
store by creating two
columns namely "Key" and
"Value" in each tables. We
are going to define the data
type as binary here (Or
Stringified Binary) so that
arbitrary data can be dumped
(I feel its of type
StaticBuffer Key and
StaticBuffer value. Is that
correct? )
Since, we are modelling
SnowFlake as Key-Value
store, it makes sense to
have a SnowFlakeManager
class implement OrderedKeyValueStore
like for BerkleyJE? Is that
correct understanding?
Updates are that we have
almost finished development
of SnowFlakeManager class.
The required methods needed
are implemented like beginTransaction,
openDatabase though
one particular function not
done is mutateMany is not
done, but it will be done as
it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic
functions in KeyValueStore
is also done like insert
(Insert binary key-value),
get (Get from binary key),
delete (Delete a row using
binary key). We are kinda
stuck at the function getSlice().
What does it do?
Now, my question here is
that, slice query is used
while queryingfor properties
for vertices
(edges/properties) by
slicing the relations of
vertex and slicing them
based on filters/conditions.
The following steps are
followed in getSlice
function
(BerkleyKeyValueStore -
berkleydb &
ColumnValueStore - inmemory)
:
- Find the row from the
passed key. (Returns a
Binary value against the
binary key)
- Fetch slice
bounderies, i.e. slice
start and end from query
passed
- Apply the slice
boundries on the
returned value in 1st
step else, fetch the
first results (pt 1) by
applying the slicing
conditions in step
My question is related
to last step. Since my
data in DB is just "Binary
Key-Binary Value", how can
we apply another
constraints (slice
conditions) in query? It
just doesn't have any
additional meta data to
apply slice on as I just
have 2 columns in my
table.
Hope my explaination
was clear for you to
understand. I want to know
primarily how the last
step would work in the
data model I described
above (Having 2 columns,
one for Key and other for
Value. And each of
stringified binary data
type). And, is the data
model selected good
enough?
Thanks in advance. And
I promise this time my
replies will be quicker
:-)
On Friday, 25 October 2019
03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi
Debashish,
here are my 2
cents:
First of all, you
need to be clear with
yourself as to why
exactly you want to
build a new backend?
E.g. do you find that
the existing ones are
sub-optimal for
certain use cases, or
they are too hard to
set up, or you just
want to provide a
backend to a cool new
database in the hope
that it will increase
adoption, or smth
else? In other words,
do you have a clear
idea of what is this
new backend going to
provide which the
existing ones do not,
e.g. advanced
scalability or
performance or ease of
setup, or just an
option for people with
existing Snowflake
infra to put it to a
new use?
Second, you are
almost correct, in
that basically all you
need to implement are
three interfaces:
-
KeyColumnValueStoreManager,
which allows opening
multiple instances of
named
KeyColumnValueStores
and provides a certain
level of transactional
context between
different stores it
has opened
-
KeyColumnValueStore -
which represents an
ordered collection of
"rows" accessible by
keys, where each row
is a
- KeyValueStore -
basically an ordered
collection of
key-value pairs, which
can be though of as
individual "columns"
of that row, and their
respective values
Both row and column
keys, and the data
values are generic
byte data.
Possibly the
simplest way to
understand the
"minimum contract"
required by Janusgraph
from a backend is to
look at the inmemory
backend. You will see
that:
-
KeyColumnValueStoreManager
is conceptually a Map
of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is
conceptually a
NavigableMap of "rows"
or KeyValueStores
(i.e. a "table") ,
- each
KeyValueStore is
conceptually an
ordered collection of
key -> value pairs
("columns").
In the most basic
case, once you
implement these three
relatively simple
interfaces, Janusgraph
can take care of all
the translation of
graph operations such
as adding vertices and
edges, and of gremlin
queries, into a series
of read-write
operations over a
collection of KCV
stores. When you open
a new graph,
JanusGraph asks the
KeyColumnValueStoreManager
implementation to
create a number of
specially named
KeyColumnValueStores,
which it uses to store
vertices, edges, and
various indices. It
creates a number of
"utility" stores which
it uses internally for
locking, id management
etc.
Crucially, whatever
stores Janusgraph
creates in your
backend
implementation, and
whatever it is using
them for, you only
need to make sure that
you implement those
basic interfaces which
allow to store
arbitrary byte data
and access it by
arbitrary byte keys.
So for your first
"naive"
implementation, you
most probably
shouldn't worry too
much about translation
of graph model to KCVS
model and back - this
is what Janusgraph
itself is mostly about
anyway. Just use
StoreFeatures to tell
Janusgraph that your
backend supports only
most basic operations,
and concentrate on
thinking how to best
implement the KCVS
interfaces with your
underlying
database/storage
system.
Of course, after
that, as you start
thinking of supporting
better levels of
consistency/transaction
management across
multiple stores, about
performance, better
utilising native
indexing/query
mechanisms, separate
indexing backends,
support for
distributed backend
model etc etc - you
will find that there
is more to it, and
this is where you can
gain further insights
from the
documentation,
existing backend
sources and asking
more specific
questions.
Hope this helps,
Dmitry
On Thu,
24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that
JanusGraph needs
a column-family type
nosql database
as storage
backend, and
hence that is
why we have
Scylla,
Cassandra, HBase
etc. SnowFlake
isn't a column
family database,
but it has a
column data type
which can store
any sort of
data. So we can
store complete JSON Oriented Column family data here
after massaging
/ pre-processing
the data. Is
that a practical
thought? Is is
practical enough
to implement?
If
it is practical
enough to
implement, what
needs to be
done? I'm going
through the
source code, and
I'm basing my
ideas based on
my understanding
from janusgraph-cassandra and janusgraph-berkley projects.
Please correct
me if I'm wrong
in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These
class needs to
have build features object
which is more
or less like
storage
connection
configuration.
They need to
have a beginTransaction method
which creates
the actual
connection to
corresponding
storage
backend. Is
that correct?
- You
will need to
have
corresponding Transaction classes
which create
the
transaction to
corresponding
backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though
I can see and
understand the
transaction
being created
in BerkeleyJETx I
don't see
something
similar for CassandraTransaction .
So am I
missing
something in
my
undesrtanding
here?
- You
need to have
KeyColumnValueStore class
for backend.
Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to
extend KeyColumnValueStore .
This class
takes care of
massaging the
data into KeyColumnFormat so
that they can
then be
inserted into
corresponding
table inside
Storage
Backend.
- So
question to my
mind are, what
will be
structure of
those classes?
- Are
there some
methods which
needs to be
present always
like I see getSlice() being
used across in
all classes.
Also, how do
they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any
other classes
I'm missing
out on or
these 3 are
the only ones
needed to be
modified to
create a new
storage
backend?
- Also,
if these 3 are
only classes
needed, and
let's say we
success in
using
SnowFlake as
storage
backend, how
do the
read aspect of janusgraph/query aspect gets
solved? Are
there any
changes needed
as well on
that end or
JanusGraph is
so abstracted
that it can
now start
picking up
from new
source?
- And,
I thought
there would be
some classes
which would be
reading in
from "gremlin
queries" doing
certain
"pre-processing
into certain
data
structures
(tabular)" and
then pushed it
through some
connection
into
respective
backends. This
is where we
cant help, is
there a way to
visualize
those objects
after
"pre-processing"
and then store
those objects
as it is in
SnowFlake and
reuse it to
fulfill
gremlin
queries.
I
know we can
store random
objects in
SnowFlake,
just looking
at changed
needed at
JanusGraph
level to
achieve those.
Any
help will be
really
appreciated.
Thanks
in Advance.
--
You received this
message because you
are subscribed to the
Google Groups
"JanusGraph
developers" group.
To unsubscribe from
this group and stop
receiving emails from
it, send an email to jan...@....
To view this
discussion on the web
visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message
because you are subscribed to
the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group
and stop receiving emails from
it, send an email to jan...@....
To view this discussion on the
web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are
subscribed to the Google Groups
"JanusGraph developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to
jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed
to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/99dff8e5-e263-40be-9c4e-7c7ecd0d2316%40googlegroups.com.
--
Best regards,
Evgeniy Ignatiev.
|
|
Debasish Kanhar <d.k...@...>
Hi Evgeniy Ignatiev.
Sorry
about late response. Might have missed out the message. Was really busy with trying all sorts of possible explorations to make SnowFlake + JanusGraph faster, but not much success. Well yes, we
are using BerkleyDB as some sort of local write-through cache. That
seems only way we can have acceptable level of performance in the
system.
During data loading ,we load the data into BerkleyDB which is cached/backed up at regular intervals. Once the specified time is elapsed, we take backup/copy of local BerkleyDB storage, then do bulk Migration by iterating through BerkleyDB records, creating custom SQL queries and load that to SnowFlake then create another new BerkleyDB for new time range.
Such way, we are able to do data writes at more faster rate, and we also make use of the local version of backued up BerkleyDB to server Read requests as well thus helping us save time.
The only issue with this approach is that, the Graphs across time intervals of backup are disconnected. i.e. let's say I took backup in BerkleyDB w.r.t 05-05-2020 and another on 06-04-2020, so the Graphs for same node will be different based on the request date.
Since our use case requires us to view Graphs only across a specified time period, and not across universal time period, so we don't care for that as of now, but I feel this might be an important issue if above mentioned solution is made a generic solution.
One possible alternative to handle this each time we create a new local BerkleyDB after specified time window is crossed, we can create another background app which can append to our BerkleyDB store, such that one Global BerkleyDB store keeps on increasing in size and becomes the de facto local store to serve all requests for all dates and helps us build an universal Graph for a node. But don't know how practical this solution is.
Snowflake is
really not optimized to perform multiple small operations, single
insert is almost of the same latency as bulk insert
This is turning out to be bigges challenge as you mentioned. We got in call with Co-founder of Snowflake, and they are now looking at implementing Prepared statements for Snowflake which is missing. I'm being told by him that once Prepared statements are done, the repeated queries can be optimized by 70-90%
and we need exactly that, as the Snowflake queries are repetitive, except that it changes only in Key range. But such implementation is arelast 1 year away as he mentioned.
Updates are significantly slower and single updates are really
devastating to performance I know, but per my understanding there is no updates. We only do WRITE query, which we do in bulk while migrating from Snowflake. Our use case don't need any writes on the fly, so haven't explored how the corresponding Snowflake queries are forumalated, but I still didn't see any UPDATE queries may it be in CQL or BerkleyDB. As for READs, its only select statements which are executed against Snowflake in our case.
Updates are significantly slower and single updates are really
devastating to performance Oh, you brought out a really nice point. We are planning to implement a bulk read and write, so that we can bulkify the repeatative queries we do to Snowflake w.r.t single operation from Snowflake, like Querying for index is iterative process, we can bulkify that. Or querying for edges of a single vertex is iterative process which we can bulkify. So if we want to do query for 5 vertices, it becomes 5 bulked queries. On those lines. But we will have to change the core code of Janusgraph, Tinkerpop so that bulked queries can be reconciled to form single Graph object. We have started exploration into that, and I'll be creating a write up for those in Janusgraph-dev and Gremlin-users groups, asking anyone from community who can help in those aspects. Its a long process, but making Snowflake work with JanusGraph is proving out to be more challenging. :-( The reason why we have to stick with Snowflake backend is because the whole application has migrated to Snowflake, and we don't know what to do with our Graph component, as I don't know any Graph applicaion which works on top of Snowflake
Hope you check out my other post on my last point I mentioned, and hope for some comments from you :-)
toggle quoted message
Show quoted text
On Tuesday, 4 February 2020 16:42:09 UTC+5:30, Evgeniy Ignatiev wrote:
Hello.
Awesome job! I have a couple of questions about your data loading
approach if you don't mind.
Is it simply aggregating writes locally before writing them to
Snowflake? Or do you also use BerkeleyDB as a local write-through
cache, from where reads are served for data is not yet in
Snowflake?
Drop in performance sounds expectable in comparison to Cassandra,
it is not simply RDBMS vs NoSQL, but DWH vs NoSQL, Snowflake is
really not optimized to perform multiple small operations, single
insert is almost of the same latency as bulk insert, ideally
significantly large bulk insert for JDBC driver to leverage
internal stage loading optimization, as I understand you are going
to do it manually through PUT FILE + COPY INTO combination.
Updates are significantly slower and single updates are really
devastating to performance (order of magnitude degradation with
hundreds of concurrent threads) due to locking behavior and write
amplification that Snowflake micro-partitioning should perform
(overwriting whole micro-partition and/or creating single record
file which will result in single object stored in underlying
storage like S3).
Also bulk reading by means of SQL might not be worth it too, e.g.
if you want to use SparkGraphComputer - Snowflake Spark connector
itself, issues direct SQL queries only to request metadata, even
for native SQL backed DataFrames/DataSets. Actual reading happens
in parallel from executors by offloading data to S3 stage and
reading directly from it.
Best regards,
Evgeniy Ignatiev.
On 2/4/2020 12:40 AM, Debasish Kanhar
wrote:
To anyone if they are following this Thread. Wanted to post
an update if anyone is interested, else will close out the
Thread.
The storage adopted we created tries to model SnowFlake as
KeyValueStore, it can be modelled as KeyColumnValueStore as
well, but its all about deciding about underlaying data
structure and doing required transformations.
But, as suspected, we see dredful drop in
performance. Since JanusGraph issues various multi part
queries, that kind of slows down the process overall. To be
honest, we aren't able to do any sort of WRITE operation in
practical enough time, and are only able to do READ operation
with slower response (almost on border line of what can be
thought to be acceptable). To tackle the issue of WRITE
opearation, we are implementing a custom write pipeline where
we fetch data from input tables, transform it and load to
local BerkleyDB store. We then iterate over BerkleyDB local
files/tables and load those in SnowFlake. Our gremlin server
can then pick up from the set of SnowFlake tables to serve any
sort of Gremlin queries for READ operation.
On Wednesday, 4 December 2019 10:36:08 UTC+5:30, Debasish Kanhar
wrote:
Thanks Dimitry for the detailed explanation.
Few of my counter questions:
I had gone through the the Big Data model you mentioned
and also the architecture diagram of underlaying Titan
Blueprints Graph (Some 6 year old repo). I was able to
deduce the internal structure how Janus stores the
intrinsic Graph elements. It stores as Adjacency list as
shown in image you shared where the key is "Row Key for
Vertex. Maybe sereliazed version of its vertex ID or
unique identifier? ". The row of vertex has 2 components
within it except for RowKey. i.e.
1: All unique properites/relations. It in turn contain
column within it storing meta information for the
relation. This is also sorted by "ColumnKey" which is
probably combination of RelationName and some ID.
2: All non unique relations. It is as super column
under which we have multiple columns, sorted by
"ColumnKey" and retrivable according to that as well.
And, I think these above 2 types of relations are the
ones which are queries / retrived from Janus using
getSlice method. Is that correct?
But then when I tried to model this as is, I though we
would have columns corresponding to each relation/relation
super column.But to verify my understanding I did query on
internal table structure in Cassandra to understand the
structurr, and as mentioned I just saw 3 columns not "N"
columns corresponding to each relation I was expecting.
If we are to map those column to the data structure
defined above, how do they map?
Key is same as RowKey
value is same as collection of unique relations?
column1 is collection of all super columns?
If the above is correct, it helps a lot in
understanding getSlice method.
But then it brings me to next question:
Whenever a row is inserted, it means that either a new
vertex is added, or existing vertex is mutated in some way
or other. So, based on above understanding, "value" as
"serialized format" remains the same. So does its
"column1". As the represent some static information
regarding a vertex. i.e. its relations and properties. So,
whenever you do, "getSlice between sliceStart and
sliceEnd", the results won't change unless sliceStart and
sliceEnd conditions change. Is that understanding correct
as well?
So, is this understanding correct as well: Fetch a row
first. Then fetch its subset of properties if and only if
they fall under a range of sliceStart and sliceEnd?
Also, you don't need to think this from SnowFlake
perspective but think of this from RDBMS perspective.
Anything possible in RDBMS is possble in SnowFlake in
additon to extra features. So if its possible in RDBMS its
logically possible in SnowFlake as well.
Really thanks again for your suggestions. If you can
clear a few counter doubts, w.r.t. any RDBMS, that will be
great.
But looks like only way to check feasibility study
would be to implement Unit tests (My implementation of
CQLStoreTest). Is that correct?
On Tuesday, 3 December 2019 05:56:58 UTC+5:30, Dmitry
Kovalev wrote:
Hi Debashish,
in terms of wrapping one's head around what
getSlice() method does - conceptually it is not hard
to understand, if you peruse the link I have referred
you to in my original reply:
The relevant part of it is really short so I'll
just copy it here (with added emphasis in bold):
===quote===
Bigtable Data
Model

Under the Bigtable data model each
table is a collection of rows. Each row is uniquely
identified by a key. Each row is comprised of an
arbitrary (large, but limited) number of cells. A
cell is composed of a column and value. A cell is
uniquely identified by a column within a given row.
Rows in the Bigtable model are called "wide rows"
because they support a large number of cells and the
columns of those cells don’t have to be defined up
front as is required in relational databases.
JanusGraph
has an additional requirement for the Bigtable
data model: The cells must be sorted by their
columns and a subset of the cells specified by a
column range must be efficiently retrievable (e.g.
by using index structures, skip lists, or binary
search).
===/quote===
Basically, getSlice method is the formal
representation of above requirement in bold: based on
the order defined for "column keys" space, it should
return all "columns" whose keys lay "between" a start
and end key values, given in SliceQuery... that is,
>= start and <=end... Please refer to the
javadoc for more detail.
However, answering the question of how do you
effectively implement it in your backend is pretty
much the crux of your potential contribution.
If the underlying DB's data model more or less
"natively" supports the above (as e.g. in the case of
Cassandra, BDB etc), then it becomes relatively easy.
If the underlying data model is different, then it
gets us back to the question which has been asked a
couple of times in this thread - i.e. whether it is
actually feasible and/or desirable to try and
implement it?
For example, in order to implement it in a "classical"
RDBMS, your would have to find one which supports
ordering and indexing of byte columns/blobs, and then
probably encounter scalability issues if you chose to
model the whole key-column-value store as one table
with row key, column key and data... It might still be
possible to address these issues and implement it
reasonably effectively, but it is unclear what would
be the point - as you would effectively have to
circumvent the "relational/SQL" top abstraction layer,
which is the whole point of RDBMS, to get back to
lower level implementation details.
Unfortunately I know nothing about Snowflake and it's
data model, and don't have the time to learn about it
in any sufficient detail any time soon, so I cannot
really advise you neither on feasibility nor on any
implementation details.
Hope this helps,
Dmitry
On Sun, 1 Dec 2019 at 09:04, Debasish
Kanhar < d...@...>
wrote:
Hello any developers following this thread:
As suggested by Dimitry, CQL adopter uses
prepared statements, and hence that would be
appropriate for me in sense that, I'll be using
SQL statements (SnowSQL) for SnowFlake querying
using a DAO. Thus CQL and SnowFlake adopter I'm
building would be similar and hence makes sense to
reference out of those.
As mentioned before, I'm currently blocked at
the method getSlice. I know that
the method is used while querying the data, but
I'm unable to get my head around how does it work
internally. A blind implementation might work, but
it won't give me an understanding how its working
internally. If anyone can help me understand how
it works, a similar implementation for SnowFlake
becomes easier then.
As mentioned before I'm basing my understanding
from CQL adopter. If we look at
CQLKeyColumnValueStore under getSlice
method, it makes use of this.getSlice
prepared statement to fulfill query. The
this.getSlice is as follows:
this.getSlice = this.session.prepare(select()
.column(COLUMN_COLUMN_NAME)
.column(VALUE_COLUMN_NAME)
.fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME)
.fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME)
.from(this.storeManager.getKeyspaceName(), this.tableName)
.where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING)))
.and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING)))
.and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING)))
.limit(bindMarker(LIMIT_BINDING)));
The this.getSlice() is used in
the method public EntryList
getSlice() which uses the prepared
statement above to execute some query. When
the following happens (Contents of getSlice
method)
final Future<EntryList> result = Future.fromJavaFuture(
this.executorService,
this.session.executeAsync(this.getSlice.bind()
.setBytes(KEY_BINDING, query.getKey().asByteBuffer())
.setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer())
.setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer())
.setInt(LIMIT_BINDING, query.getLimit())
.setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel())))
.map(resultSet -> fromResultSet(resultSet, this.getter));
interruptibleWait(result);
Is following understanding correct? Anyone with
JanusGraph and Cassandra expertise can help.
I'm updating the base query from following
bindings:
.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer()))
.and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer()))
.and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer()))
.limit(query.getLimit()));
Is above interpolation correct?
So, if we were to model this in any RDBMS
(SnowFlake for eg though SnowFlake isn't RDBMS, it
is similar in terms of storage and query engine)
with 3 columns as (key, value, column1) of
datatypes string (varchar with binary info) can
something similar query be correct?
SELECT .... FROM keyspace WHERE
key = query.getKey().asByteBuffer()
and
column1 >= query.getSliceStart().asByteBuffer()
and
column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()
Does this sort of query sound similar in terms
of what is targeted to achieve? If I can
understand the actual meaning of the prepared
statements here, I can also base my undertandings
for rest of methods which would be required for
doing mutations in underlaying backend.
Any help is really appreciated as we are kinda
getting tighter and tighter on deadline regarding
the feasibility PoC of SnowFlake as backend for
JanusGraph.
Thanks in advance
On Thursday, 28 November 2019 21:05:09
UTC+5:30, Debasish Kanhar wrote:
Hi Evgeniy,
Thanks for the question. We plan to open
source it once implemented but we are still
long way from implementation. Will be really
grateful to community who can help in any way
to achieve this :-)
On Thursday, 28 November 2019 16:16:27 UTC+5:30,
Evgeniy Ignatiev wrote:
Hi.
Is this backend open-source/will be
open-sourced?
Best regards,
Evgeniy Ignatiev.
On 11/28/2019 1:40 PM, Debasish Kanhar
wrote:
Hi Ryan.
Well that's a very valid question
you asked. The current implementation
of backends like Scylla as you
mentioned are really highly
performant. There is no specific
problem in mind, but off late I have
been dealing with a lot of clients who
are migrating their whole system into
SnowFlake, including the whole Data
storage and Analytics components as
well. SnowFlake is a hot upcoming Data
storage and warehousing system.
Those clients are really reluctant
to add another storage component to
their application. Reasons can be a
lot like due to high costs, or added
complexity of their architecture, or
duplication of data across storages.
But at the same time these clients
also want to incorporate Graph
Databases and Graph Analytics into
their application as well. This
integration is targeted for those set
of customers/clients who are/have
migrating/migrated into SnowFlake and
want to have Graph based component as
well. For now, its simply not possible
for them to have JanusGraph with their
SnowFlake data storages.
Hope I was able to explain it
clearly :-)
On Wednesday, 27 November 2019 20:40:52
UTC+5:30, Ryan Stauffer wrote:
Debasish,
This sounds like an
interesting project, but I do
have a question about your
choice of Snowflake. If I
missed your response to this in
the email chain, I apologize,
but what problems with the
existing high-performance
backends (Scylla, for instance)
are you trying to solve with
Snowflake? The answer to that
would probably inform your
specific implementation over
Snowflake.
Thanks,
Ryan
On Wed, Nov 27,
2019 at 3:18 AM Debasish Kanhar
< d...@...>
wrote:
Hi Dimitriy,
Sorry about the late
response. I was working on
this project part time only
till last week when we moved
into full time dev for this
PoC. Really thanks to your
pointers and Jason's that we
have been able to start with
the development works and we
have some ground work to
start with :-)
So,we are modelling
SnowFlake (Which is like SQL
File store) as a Key-Value
store by creating two
columns namely "Key" and
"Value" in each tables. We
are going to define the data
type as binary here (Or
Stringified Binary) so that
arbitrary data can be dumped
(I feel its of type
StaticBuffer Key and
StaticBuffer value. Is that
correct? )
Since, we are modelling
SnowFlake as Key-Value
store, it makes sense to
have a SnowFlakeManager
class implement OrderedKeyValueStore
like for BerkleyJE? Is that
correct understanding?
Updates are that we have
almost finished development
of SnowFlakeManager class.
The required methods needed
are implemented like beginTransaction,
openDatabase though
one particular function not
done is mutateMany is not
done, but it will be done as
it in turn calls KeyValueStore.insert()
method.
Also, a lot of basic
functions in KeyValueStore
is also done like insert
(Insert binary key-value),
get (Get from binary key),
delete (Delete a row using
binary key). We are kinda
stuck at the function getSlice().
What does it do?
Now, my question here is
that, slice query is used
while queryingfor properties
for vertices
(edges/properties) by
slicing the relations of
vertex and slicing them
based on filters/conditions.
The following steps are
followed in getSlice
function
(BerkleyKeyValueStore -
berkleydb &
ColumnValueStore - inmemory)
:
- Find the row from the
passed key. (Returns a
Binary value against the
binary key)
- Fetch slice
bounderies, i.e. slice
start and end from query
passed
- Apply the slice
boundries on the
returned value in 1st
step else, fetch the
first results (pt 1) by
applying the slicing
conditions in step
My question is related
to last step. Since my
data in DB is just "Binary
Key-Binary Value", how can
we apply another
constraints (slice
conditions) in query? It
just doesn't have any
additional meta data to
apply slice on as I just
have 2 columns in my
table.
Hope my explaination
was clear for you to
understand. I want to know
primarily how the last
step would work in the
data model I described
above (Having 2 columns,
one for Key and other for
Value. And each of
stringified binary data
type). And, is the data
model selected good
enough?
Thanks in advance. And
I promise this time my
replies will be quicker
:-)
On Friday, 25 October 2019
03:17:24 UTC+5:30, Dmitry
Kovalev wrote:
Hi
Debashish,
here are my 2
cents:
First of all, you
need to be clear with
yourself as to why
exactly you want to
build a new backend?
E.g. do you find that
the existing ones are
sub-optimal for
certain use cases, or
they are too hard to
set up, or you just
want to provide a
backend to a cool new
database in the hope
that it will increase
adoption, or smth
else? In other words,
do you have a clear
idea of what is this
new backend going to
provide which the
existing ones do not,
e.g. advanced
scalability or
performance or ease of
setup, or just an
option for people with
existing Snowflake
infra to put it to a
new use?
Second, you are
almost correct, in
that basically all you
need to implement are
three interfaces:
-
KeyColumnValueStoreManager,
which allows opening
multiple instances of
named
KeyColumnValueStores
and provides a certain
level of transactional
context between
different stores it
has opened
-
KeyColumnValueStore -
which represents an
ordered collection of
"rows" accessible by
keys, where each row
is a
- KeyValueStore -
basically an ordered
collection of
key-value pairs, which
can be though of as
individual "columns"
of that row, and their
respective values
Both row and column
keys, and the data
values are generic
byte data.
Possibly the
simplest way to
understand the
"minimum contract"
required by Janusgraph
from a backend is to
look at the inmemory
backend. You will see
that:
-
KeyColumnValueStoreManager
is conceptually a Map
of store name ->
KeyColumnValueStore,
- each
KeyColumnValueStore is
conceptually a
NavigableMap of "rows"
or KeyValueStores
(i.e. a "table") ,
- each
KeyValueStore is
conceptually an
ordered collection of
key -> value pairs
("columns").
In the most basic
case, once you
implement these three
relatively simple
interfaces, Janusgraph
can take care of all
the translation of
graph operations such
as adding vertices and
edges, and of gremlin
queries, into a series
of read-write
operations over a
collection of KCV
stores. When you open
a new graph,
JanusGraph asks the
KeyColumnValueStoreManager
implementation to
create a number of
specially named
KeyColumnValueStores,
which it uses to store
vertices, edges, and
various indices. It
creates a number of
"utility" stores which
it uses internally for
locking, id management
etc.
Crucially, whatever
stores Janusgraph
creates in your
backend
implementation, and
whatever it is using
them for, you only
need to make sure that
you implement those
basic interfaces which
allow to store
arbitrary byte data
and access it by
arbitrary byte keys.
So for your first
"naive"
implementation, you
most probably
shouldn't worry too
much about translation
of graph model to KCVS
model and back - this
is what Janusgraph
itself is mostly about
anyway. Just use
StoreFeatures to tell
Janusgraph that your
backend supports only
most basic operations,
and concentrate on
thinking how to best
implement the KCVS
interfaces with your
underlying
database/storage
system.
Of course, after
that, as you start
thinking of supporting
better levels of
consistency/transaction
management across
multiple stores, about
performance, better
utilising native
indexing/query
mechanisms, separate
indexing backends,
support for
distributed backend
model etc etc - you
will find that there
is more to it, and
this is where you can
gain further insights
from the
documentation,
existing backend
sources and asking
more specific
questions.
Hope this helps,
Dmitry
On Thu,
24 Oct 2019 at 21:27,
Debasish Kanhar < d...@...>
wrote:
I
know that
JanusGraph needs
a column-family type
nosql database
as storage
backend, and
hence that is
why we have
Scylla,
Cassandra, HBase
etc. SnowFlake
isn't a column
family database,
but it has a
column data type
which can store
any sort of
data. So we can
store complete JSON Oriented Column family data here
after massaging
/ pre-processing
the data. Is
that a practical
thought? Is is
practical enough
to implement?
If
it is practical
enough to
implement, what
needs to be
done? I'm going
through the
source code, and
I'm basing my
ideas based on
my understanding
from janusgraph-cassandra and janusgraph-berkley projects.
Please correct
me if I'm wrong
in my
understanding.
- We
need to have a
StoreManager class
like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which
extends either DistributedStoreManager or LocalStoreManager and
implements KeyColumnValueStoreManager class
right? These
class needs to
have build features object
which is more
or less like
storage
connection
configuration.
They need to
have a beginTransaction method
which creates
the actual
connection to
corresponding
storage
backend. Is
that correct?
- You
will need to
have
corresponding Transaction classes
which create
the
transaction to
corresponding
backend like
*CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction`
class. Though
I can see and
understand the
transaction
being created
in BerkeleyJETx I
don't see
something
similar for CassandraTransaction .
So am I
missing
something in
my
undesrtanding
here?
- You
need to have
KeyColumnValueStore class
for backend.
Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc.
They need to
extend KeyColumnValueStore .
This class
takes care of
massaging the
data into KeyColumnFormat so
that they can
then be
inserted into
corresponding
table inside
Storage
Backend.
- So
question to my
mind are, what
will be
structure of
those classes?
- Are
there some
methods which
needs to be
present always
like I see getSlice() being
used across in
all classes.
Also, how do
they work?
- Do
they just
convert incoming gremlin queries into KeyColumnValue structure ?
- Are
there any
other classes
I'm missing
out on or
these 3 are
the only ones
needed to be
modified to
create a new
storage
backend?
- Also,
if these 3 are
only classes
needed, and
let's say we
success in
using
SnowFlake as
storage
backend, how
do the
read aspect of janusgraph/query aspect gets
solved? Are
there any
changes needed
as well on
that end or
JanusGraph is
so abstracted
that it can
now start
picking up
from new
source?
- And,
I thought
there would be
some classes
which would be
reading in
from "gremlin
queries" doing
certain
"pre-processing
into certain
data
structures
(tabular)" and
then pushed it
through some
connection
into
respective
backends. This
is where we
cant help, is
there a way to
visualize
those objects
after
"pre-processing"
and then store
those objects
as it is in
SnowFlake and
reuse it to
fulfill
gremlin
queries.
I
know we can
store random
objects in
SnowFlake,
just looking
at changed
needed at
JanusGraph
level to
achieve those.
Any
help will be
really
appreciated.
Thanks
in Advance.
--
You received this
message because you
are subscribed to the
Google Groups
"JanusGraph
developers" group.
To unsubscribe from
this group and stop
receiving emails from
it, send an email to jan...@....
To view this
discussion on the web
visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message
because you are subscribed to
the Google Groups "JanusGraph
developers" group.
To unsubscribe from this group
and stop receiving emails from
it, send an email to jan...@....
To view this discussion on the
web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.

--
You received this message because you are
subscribed to the Google Groups
"JanusGraph developers" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to
jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed
to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/99dff8e5-e263-40be-9c4e-7c7ecd0d2316%40googlegroups.com.
--
Best regards,
Evgeniy Ignatiev.
|
|