Date   

Re: [DISCUSS] Developer chat (+FoundationDB chat)

Henry Saputra <henry....@...>
 

Thanks for driving this, Florian


On Thu, Apr 16, 2020 at 6:25 AM Florian Hockmann <f...@...> wrote:
You should all be able to join. I'll also create a PR to add it to our README.md.

Am Mittwoch, 15. April 2020 14:16:36 UTC+2 schrieb Florian Hockmann:
It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:
Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/45ef5fe9-7ea8-4757-9018-2add4e573bce%40googlegroups.com.


Re: Batching Queries to backend for faster performance

Debasish Kanhar <d.k...@...>
 

Thanks Pavel.

I've tried all those, and those have helped me reduce the execution time by almost 50% along with query optimizations as well. But that's still slower for my use case. :-)

I'm looking at more and more batching options wherever I can.

You can check out my discussion on Gremlin-users : https://groups.google.com/forum/#!topic/gremlin-users/RaIHVbDE5rk for more clarity on requirement and details and any help to give ideas on how to implement will be of great help :-)

On Thursday, 16 April 2020 21:20:40 UTC+5:30, Pavel Ershov wrote:

JG has three options to reduce number of queries https://docs.janusgraph.org/basics/configuration-reference/#query

PROPERTY_PREFETCHING -- enabled by default
USE_MULTIQUERY -- disabled by default
BATCH_PROPERTY_PREFETCHING -- disabled by default

Last two needs to implement OrderedKeyValueStore.getSlices and enable multiQuery feature by store



среда, 8 апреля 2020 г., 19:10:38 UTC+3 пользователь Debasish Kanhar написал:
For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



Re: Batching Queries to backend for faster performance

Pavel Ershov <owner...@...>
 


JG has three options to reduce number of queries https://docs.janusgraph.org/basics/configuration-reference/#query

PROPERTY_PREFETCHING -- enabled by default
USE_MULTIQUERY -- disabled by default
BATCH_PROPERTY_PREFETCHING -- disabled by default

Last two needs to implement OrderedKeyValueStore.getSlices and enable multiQuery feature by store



среда, 8 апреля 2020 г., 19:10:38 UTC+3 пользователь Debasish Kanhar написал:

For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



Re: [DISCUSS] Developer chat (+FoundationDB chat)

Florian Hockmann <f...@...>
 

I created the channel: https://gitter.im/janusgraph/janusgraph-dev
You should all be able to join. I'll also create a PR to add it to our README.md.

Am Mittwoch, 15. April 2020 14:16:36 UTC+2 schrieb Florian Hockmann:

It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:
Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Florian Hockmann <f...@...>
 

It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:

Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Henry Saputra <henry....@...>
 

Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Florian Hockmann <f...@...>
 

I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:



On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <janusgr...@googlegroups.com> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <janusgr...@googlegroups.com> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.


Re: Batching Queries to backend for faster performance

Debasish Kanhar <d.k...@...>
 

For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



[RESULT][VOTE] JanusGraph 0.5.1 release

Oleksandr Porunov <alexand...@...>
 

This vote is now closed with a total of 4 +1s, no +0s and no -1s. The results are:

BINDING VOTES:

+1  (3 -- Oleksandr Porunov, Jan Jansen, Florian Hockmann)
0   (0)
-1  (0)

NON-BINDING VOTES:

+1 (1 -- Nicolas)
0  (0)
-1 (0)

Thank you very much,
Oleksandr Porunov


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Jan Jansen <faro...@...>
 



On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <janus...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <janusgr...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Misha Brukman <mbru...@...>
 

On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <janusgr...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Jan Jansen <faro...@...>
 

Hi Misra,
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

I didn’t know history thing in slack. I mentioned also Discord which also includes voice channels.

Greetings,
Jan

On 7. Apr 2020, at 16:00, 'Misha Brukman' via JanusGraph developers <janus...@...> wrote:


FWIW, when we originally started JanusGraph, Henry Saputra (cc'd) lead an extensive evaluation of various chat options and we settled on Gitter, primarily because:
  • unlike Slack, Gitter doesn't require running a bot to have people join a workspace: by default, Slack is closed to sign-ups; you can whitelist some domains, but you can't make it "public" so you end up having to run a bot like SlackinSlackEngine or similar (with appropriate credentials) just so that folks can join the chat
  • you have to create another account just for Slack — with Gitter, you can just trivially sign-in via OAuth, using your existing GitHub or Twitter account
  • on the free plan, Slack limits how much history you can see, which decreases visibility for folks who are not signed in 24/7; Gitter provides full free access to the entire history
There may have been other issues which I'm forgetting; Henry, please add what I missed.

Overall, I would recommend that we not splinter the chat rooms across more than 1 service: since we're already using Gitter for chat, can't we just add another room there for FoundationDB or any other topic? It also has good support for Markdown and code formatting; I'm not sure what exactly we're missing from Slack with Gitter.

Misha

On Tue, Apr 7, 2020 at 3:52 AM Debasish Kanhar <d.k...@...> wrote:
Sounds good Jan.

Let's go with the option where we have maximum developers/committers involved/or use it. So that the query posters / users can reach the maximum of our community :-)

On Tuesday, 7 April 2020 12:03:33 UTC+5:30, Jan Jansen wrote:
Hi

We have three different chat options. If no one wants to add something, i will open up a vote to decide which platform we want to use. (This will happen tomorrow.)

Here the options again:
  • Gitter
  • Slack
  • Discord

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/988986cf-6bcf-4e85-8beb-6867e2481ec5%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oO%2B-g_rUg8_CRu3ywa8r6vhz4UFdJs_8NGz3wCqudUGkQ%40mail.gmail.com.


Re: [VOTE] JanusGraph 0.5.1 release

Florian Hockmann <f...@...>
 

So, we keep this VOTE thread for this release?

In that case: I tested the new distributions by starting JanusGraph Server with the in-memory backend on both as described in the docs and then connected successfully with the Gremlin Console via remote to that. For the full distribution, I also tested the same with bin/janusgraph.sh. Everything worked as expected.

I therefore change my VOTE to +1.

Am Samstag, 4. April 2020 19:23:58 UTC+2 schrieb Oleksandr Porunov:

gremlin-server has been fixed. Release jars, Sonatype artifacts and git tag have been updated

On Wednesday, April 1, 2020 at 12:30:31 PM UTC-7, Oleksandr Porunov wrote:
I have submitted PR to fix the problem with default gremlin-server : https://github.com/JanusGraph/janusgraph/pull/2067

I also checked that gremlin server has the same problem with version 0.4.0. Also, it doesn't even start by default in 0.3.1 version. So, this bug is here for a long time.
I also agree on your suggestions about naming of releases. Maybe it is better to keep `janusgraph-0.5.1.zip` as a full janusgraph version and `janusgraph-0.5.1-truncated.zip` as a truncated version.

On Wednesday, April 1, 2020 at 9:54:50 AM UTC-7, Florian Hockmann wrote:
It fails for both versions. The big problem I see with this is that our docs say about this:

By default, this will instruct JanusGraph to use it's own inmemory backend instead of a dedicated database server.

so it should really not expect some externally running Cassandra or Elasticsearch. This problem was probably already present at least in 0.5.0, but I think that it gets worse with this release because we now have the truncated zip archive as the default distribution (it's named just janusgraph-[version].zip after all and not [janusgraph-[version]-truncated.zip) and that doesn't contain a method any more to easily start JanusGraph Server like it's described in the docs. For the full distribution, users could at least switch back to janusgraph.sh which should still work.
That is why I think that we should really fix that before releasing a truncated distribution or if we really want to release now, then we should treat the full archive as the default distribution and the truncated one as an addition for users who want to save some disk space and who know that they have to manage backends on their own.

Am Mittwoch, 1. April 2020 18:21:14 UTC+2 schrieb Oleksandr Porunov:
Florian, thank you for catching it. I didn't check yet but does "./bin/gremlin-server.sh start" fails on both truncated and full JanusGraph versions? I.e. I assume you should run CQL and ES externally when you are using a truncated JanusGraph version. We already have information in our documentation that "This requires to download janusgraph-full-0.5.1.zip instead of the default janusgraph-0.5.1.zip". I.e. I assume that it is OK that default ./bin/gremlin-server.sh start fails in truncated version because it is described in the documentation.
Also, if it fails in full version also, does it work in any JanusGraph version (i.e. 0.4.0 or 0.3.0)? In case it doesn't work with older JanusGraph version, then we don't introduce any new bugs and thus the bugfix can be postponed for the future releases (unless the bug can be fixed quickly or the bug is critical). That said, I didn't yet check those bugs you have described, I will try to check them later.

On Wednesday, April 1, 2020 at 7:26:46 AM UTC-7, Florian Hockmann wrote:
I just tried both zip archives, following our docs on local installation that simply starts JanusGraph Server with ./bin/gremlin-server.sh start. This fails silently which can be noticed when one tries to connect to that from the console with :remote connect as that results in this error message: gremlin-groovy is not an available GremlinScriptEngine

This seems to be caused by the fact that gremlin-server.sh uses the default gremlin-server.yaml which uses CQL and ES.

If we now point users at gremlin-server.sh to get started instead of janusgraph.sh as that isn't included in the truncated version, then we need to make sure that this actually works out of the box.

So, my VOTE is -1.

Am Samstag, 28. März 2020 23:27:49 UTC+1 schrieb Jan Jansen:
Hi,
I have tested the truncated binary distribution, using the a pre-built docker image of janusgraph.
My vote is +1.

Greetings,
Jan

On Thursday, March 26, 2020 at 2:21:41 AM UTC+1, Oleksandr Porunov wrote:
Hello,

We are happy to announce that JanusGraph 0.5.1 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.1

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-full-0.5.1.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-0.5.1.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-0.5.1-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.1

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-051-release-date-march-25-2020

This [VOTE] will open for the next 3 days --- closing Saturday, March 29, 2020 at 01:30 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Misha Brukman <mbru...@...>
 

FWIW, when we originally started JanusGraph, Henry Saputra (cc'd) lead an extensive evaluation of various chat options and we settled on Gitter, primarily because:
  • unlike Slack, Gitter doesn't require running a bot to have people join a workspace: by default, Slack is closed to sign-ups; you can whitelist some domains, but you can't make it "public" so you end up having to run a bot like SlackinSlackEngine or similar (with appropriate credentials) just so that folks can join the chat
  • you have to create another account just for Slack — with Gitter, you can just trivially sign-in via OAuth, using your existing GitHub or Twitter account
  • on the free plan, Slack limits how much history you can see, which decreases visibility for folks who are not signed in 24/7; Gitter provides full free access to the entire history
There may have been other issues which I'm forgetting; Henry, please add what I missed.

Overall, I would recommend that we not splinter the chat rooms across more than 1 service: since we're already using Gitter for chat, can't we just add another room there for FoundationDB or any other topic? It also has good support for Markdown and code formatting; I'm not sure what exactly we're missing from Slack with Gitter.

Misha

On Tue, Apr 7, 2020 at 3:52 AM Debasish Kanhar <d.k...@...> wrote:
Sounds good Jan.

Let's go with the option where we have maximum developers/committers involved/or use it. So that the query posters / users can reach the maximum of our community :-)

On Tuesday, 7 April 2020 12:03:33 UTC+5:30, Jan Jansen wrote:
Hi

We have three different chat options. If no one wants to add something, i will open up a vote to decide which platform we want to use. (This will happen tomorrow.)

Here the options again:
  • Gitter
  • Slack
  • Discord

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/988986cf-6bcf-4e85-8beb-6867e2481ec5%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Debasish Kanhar <d.k...@...>
 

Sounds good Jan.

Let's go with the option where we have maximum developers/committers involved/or use it. So that the query posters / users can reach the maximum of our community :-)

On Tuesday, 7 April 2020 12:03:33 UTC+5:30, Jan Jansen wrote:
Hi

We have three different chat options. If no one wants to add something, i will open up a vote to decide which platform we want to use. (This will happen tomorrow.)

Here the options again:
  • Gitter
  • Slack
  • Discord


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Jan Jansen <faro...@...>
 

Hi

We have three different chat options. If no one wants to add something, i will open up a vote to decide which platform we want to use. (This will happen tomorrow.)

Here the options again:
  • Gitter
  • Slack
  • Discord


Batching Queries to backend for faster performance

Debasish Kanhar <d.k...@...>
 

Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



Re: Anyone with experience of adding new Storage backend for JanusGraph ? [Help needed w.r.t SnowFlake]

Debasish Kanhar <d.k...@...>
 

Hi Evgeniy Ignatiev.

Sorry about late response. Might have missed out the message. Was really busy with trying all sorts of possible explorations to make SnowFlake + JanusGraph faster, but not much success.
Well yes, we are using BerkleyDB as some sort of local write-through cache. That seems only way we can have acceptable level of performance in the system.

During data loading ,we load the data into BerkleyDB which is cached/backed up at regular intervals. Once the specified time is elapsed, we take backup/copy of local BerkleyDB storage, then do bulk
Migration by iterating through BerkleyDB records, creating custom SQL queries and load that to SnowFlake then create another new BerkleyDB for new time range.

Such way, we are able to do data writes at more faster rate, and we also make use of the local version of backued up BerkleyDB to server Read requests as well thus helping us save time.

The only issue with this approach is that, the Graphs across time intervals of backup are disconnected. i.e. let's say I took backup in BerkleyDB w.r.t 05-05-2020 and another on 06-04-2020, so the Graphs for same node will be different based on the request date.
Since our use case requires us to view Graphs only across a specified time period, and not across universal time period, so we don't care for that as of now, but I feel this might be an important issue if above mentioned solution is made a generic solution.
One possible alternative to handle this  each time we create a new local BerkleyDB after specified time window is crossed, we can create another background app which can append to our BerkleyDB store, such that one Global BerkleyDB store keeps on increasing in size and becomes the de facto
local store to serve all requests for all dates and helps us build an universal Graph for a node. But don't know how practical this solution is.


Snowflake is really not optimized to perform multiple small operations, single insert is almost of the same latency as bulk insert
This is turning out to be bigges challenge as you mentioned. We got in call with Co-founder of Snowflake, and they are now looking at implementing Prepared statements for Snowflake which is missing. I'm being told by him that once Prepared statements are done, the repeated queries can be optimized by 70-90%
and we need exactly that, as the Snowflake queries are repetitive, except that it changes only in Key range. But such implementation is arelast 1 year away as he mentioned.

Updates are significantly slower and single updates are really devastating to performance
I know, but per my understanding there is no updates. We only do WRITE query, which we do in bulk while migrating from Snowflake. Our use case don't need any writes on the fly, so haven't explored how the corresponding Snowflake queries are forumalated, but I still didn't see any UPDATE queries may it be in CQL or BerkleyDB. As for READs, its only select statements which are executed against Snowflake in our case.

Updates are significantly slower and single updates are really devastating to performance
Oh, you brought out a really nice point. We are planning to implement a bulk read and write, so that we can bulkify the repeatative queries we do to Snowflake w.r.t single operation from Snowflake, like Querying for index is iterative process, we can bulkify that. Or querying for edges of a single vertex is iterative process which we can bulkify. So if we want to do query for 5 vertices, it becomes 5 bulked queries. On those lines. But we will have to change the core code of Janusgraph, Tinkerpop so that bulked queries can be reconciled to form single Graph object. We have started exploration into that, and I'll be creating a write up for those in Janusgraph-dev and Gremlin-users groups, asking anyone from community who can help in those aspects. Its a long process, but making Snowflake work with JanusGraph is proving out to be more challenging. :-( The reason why we have to stick with Snowflake backend is because the whole application has migrated to Snowflake, and we don't know what to do with our Graph component, as I don't know any Graph applicaion which works on top of Snowflake

Hope you check out my other post on my last point I mentioned, and hope for some comments from you :-)

On Tuesday, 4 February 2020 16:42:09 UTC+5:30, Evgeniy Ignatiev wrote:

Hello.

Awesome job! I have a couple of questions about your data loading approach if you don't mind.

Is it simply aggregating writes locally before writing them to Snowflake? Or do you also use BerkeleyDB as a local write-through cache, from where reads are served for data is not yet in Snowflake?

Drop in performance sounds expectable in comparison to Cassandra, it is not simply RDBMS vs NoSQL, but DWH vs NoSQL, Snowflake is really not optimized to perform multiple small operations, single insert is almost of the same latency as bulk insert, ideally significantly large bulk insert for JDBC driver to leverage internal stage loading optimization, as I understand you are going to do it manually through PUT FILE + COPY INTO combination.  Updates are significantly slower and single updates are really devastating to performance (order of magnitude degradation with hundreds of concurrent threads) due to locking behavior and write amplification that Snowflake micro-partitioning should perform (overwriting whole micro-partition and/or creating single record file which will result in single object stored in underlying storage like S3).

Also bulk reading by means of SQL might not be worth it too, e.g. if you want to use SparkGraphComputer - Snowflake Spark connector itself, issues direct SQL queries only to request metadata, even for native SQL backed DataFrames/DataSets. Actual reading happens in parallel from executors by offloading data to S3 stage and reading directly from it.

Best regards,
Evgeniy Ignatiev.

On 2/4/2020 12:40 AM, Debasish Kanhar wrote:
To anyone if they are following this Thread. Wanted to post an update if anyone is interested, else will close out the Thread.

So, We were able to get the adopter working for SnowFlake. We planned on open sourcing it, and for now its hosted on Gitlab (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) if anyone is interested to take a look.

The storage adopted we created tries to model SnowFlake as KeyValueStore, it can be modelled as KeyColumnValueStore as well, but its all about deciding about underlaying data structure and doing required transformations.

But, as suspected, we see dredful drop in performance. Since JanusGraph issues various multi part queries, that kind of slows  down the process overall. To be honest, we aren't able to do any sort of WRITE operation in practical enough time, and are only able to do READ operation with slower response (almost on border line of what can be thought to be acceptable). To tackle the issue of WRITE opearation, we are implementing a custom write pipeline where we fetch data from input tables, transform it and load to local BerkleyDB store. We then iterate over BerkleyDB local files/tables and load those in SnowFlake. Our gremlin server can then pick up from the set of SnowFlake tables to serve any sort of Gremlin queries for READ operation.


On Wednesday, 4 December 2019 10:36:08 UTC+5:30, Debasish Kanhar wrote:
Thanks Dimitry for the detailed explanation.

Few of my counter questions:

I had gone through the the Big Data model you mentioned and also the architecture diagram of underlaying Titan Blueprints Graph (Some 6 year old repo). I was able to deduce the internal structure how Janus stores the intrinsic Graph elements. It stores as Adjacency list as shown in image you shared where the key is "Row Key for Vertex. Maybe sereliazed version of its vertex ID or unique identifier? ". The row of vertex has 2 components within it except for RowKey. i.e.
1: All unique properites/relations. It in turn contain column within it storing meta information for the relation. This is also sorted by "ColumnKey" which is probably combination of RelationName and some ID.
2: All non unique relations. It is as super column under which we have multiple columns, sorted by "ColumnKey" and retrivable according to that as well.

And, I think these above 2 types of relations are the ones which are queries / retrived from Janus using getSlice method. Is that correct?

But then when I tried to model this as is, I though we would have columns corresponding to each relation/relation super column.But to verify my understanding I did query on internal table structure in Cassandra to understand the structurr, and as mentioned I just saw 3 columns not "N" columns corresponding to each relation I was expecting.

If we are to map those column to the data structure defined above, how do they map?
Key is same as RowKey
value is same as collection of unique relations?
column1 is collection of all super columns?

If the above is correct, it helps a lot in understanding getSlice method.

But then it brings me to next question:
Whenever a row is inserted, it means that either a new vertex is added, or existing vertex is mutated in some way or other. So, based on above understanding, "value" as "serialized format" remains the same. So does its "column1". As the represent some static information regarding a vertex. i.e. its relations and properties. So, whenever you do, "getSlice between sliceStart and sliceEnd", the results won't change unless sliceStart and sliceEnd conditions change. Is that understanding correct as well?

So, is this understanding correct as well: Fetch a row first. Then fetch its subset of properties if and only if they fall under a range of sliceStart and sliceEnd?

Also, you don't need to think this from SnowFlake perspective but think of this from RDBMS perspective. Anything possible in RDBMS is possble in SnowFlake in additon to extra features. So if its possible in RDBMS its logically possible in SnowFlake as well.

Really thanks again for your suggestions. If you can clear a few counter doubts, w.r.t. any RDBMS, that will be great.

But looks like only way to check feasibility study would be to implement Unit tests (My implementation of CQLStoreTest). Is that correct?


On Tuesday, 3 December 2019 05:56:58 UTC+5:30, Dmitry Kovalev wrote:
Hi Debashish,

in terms of wrapping one's head around what getSlice() method does - conceptually it is not hard to understand, if you peruse the link I have referred you to in my original reply:

The relevant part of it is really short so I'll just copy it here (with added emphasis in bold):
===quote===

Bigtable Data Model

Under the Bigtable data model each table is a collection of rows. Each row is uniquely identified by a key. Each row is comprised of an arbitrary (large, but limited) number of cells. A cell is composed of a column and value. A cell is uniquely identified by a column within a given row. Rows in the Bigtable model are called "wide rows" because they support a large number of cells and the columns of those cells don’t have to be defined up front as is required in relational databases.

JanusGraph has an additional requirement for the Bigtable data model: The cells must be sorted by their columns and a subset of the cells specified by a column range must be efficiently retrievable (e.g. by using index structures, skip lists, or binary search).

===/quote===

Basically, getSlice method is the formal representation of above requirement in bold:  based on the order defined for "column keys" space, it should return all "columns" whose keys lay "between" a start and end key values, given in SliceQuery... that is, >= start and <=end... Please refer to the javadoc for more detail.


However, answering the question of how do you effectively implement it in your backend is pretty much the crux of your potential contribution.

If the underlying DB's data model more or less "natively" supports the above (as e.g. in the case of Cassandra, BDB etc), then it becomes relatively easy.

If the underlying data model is different, then it gets us back to the question which has been asked a couple of times in this thread - i.e. whether it is actually feasible and/or desirable to try and implement it?

For example, in order to implement it in a "classical" RDBMS, your would have to find one which supports ordering and indexing of byte columns/blobs, and then probably encounter scalability issues if you chose to model the whole key-column-value store as one table with row key, column key and data... It might still be possible to address these issues and implement it reasonably effectively, but it is unclear what would be the point - as you would effectively have to circumvent the "relational/SQL" top abstraction layer, which is the whole point of RDBMS, to get back to lower level implementation details.

Unfortunately I know nothing about Snowflake and it's data model, and don't have the time to learn about it in any sufficient detail any time soon, so I cannot really advise you neither on feasibility nor on any implementation details.

Hope this helps,

Dmitry

On Sun, 1 Dec 2019 at 09:04, Debasish Kanhar <d...@...> wrote:
Hello any developers following this thread:

As suggested by Dimitry, CQL adopter uses prepared statements, and hence that would be appropriate for me in sense that, I'll be using SQL statements (SnowSQL) for SnowFlake querying using a DAO. Thus CQL and SnowFlake adopter I'm building would be similar and hence makes sense to reference out of those.

As mentioned before, I'm currently blocked at the method getSlice. I know that the method is used while querying the data, but I'm unable to get my head around how does it work internally. A blind implementation might work, but it won't give me an understanding how its working internally. If anyone can help me understand how it works, a similar implementation for SnowFlake becomes easier then.

As mentioned before I'm basing my understanding from CQL adopter. If we look at CQLKeyColumnValueStore under getSlice method, it makes use of this.getSlice prepared statement to fulfill query. The this.getSlice is as follows:

this.getSlice = this.session.prepare(select()
        .column(COLUMN_COLUMN_NAME)
        .column(VALUE_COLUMN_NAME)
        .fcall(WRITETIME_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(WRITETIME_COLUMN_NAME)
        .fcall(TTL_FUNCTION_NAME, column(VALUE_COLUMN_NAME)).as(TTL_COLUMN_NAME)
        .from(this.storeManager.getKeyspaceName(), this.tableName)
        .where(eq(KEY_COLUMN_NAME, bindMarker(KEY_BINDING)))
        .and(gte(COLUMN_COLUMN_NAME, bindMarker(SLICE_START_BINDING)))
        .and(lt(COLUMN_COLUMN_NAME, bindMarker(SLICE_END_BINDING)))
        .limit(bindMarker(LIMIT_BINDING)));

The this.getSlice() is used in the method public EntryList getSlice()   which uses the prepared statement above to execute some query. When the following happens (Contents of getSlice method)

final Future<EntryList> result = Future.fromJavaFuture(
        this.executorService,
        this.session.executeAsync(this.getSlice.bind()
                .setBytes(KEY_BINDING, query.getKey().asByteBuffer())
                .setBytes(SLICE_START_BINDING, query.getSliceStart().asByteBuffer())
                .setBytes(SLICE_END_BINDING, query.getSliceEnd().asByteBuffer())
                .setInt(LIMIT_BINDING, query.getLimit())
                .setConsistencyLevel(getTransaction(txh).getReadConsistencyLevel())))
        .map(resultSet -> fromResultSet(resultSet, this.getter));
interruptibleWait(result);

Is following understanding correct? Anyone with JanusGraph and Cassandra expertise can help.

I'm updating the base query from following bindings:

.where(eq(KEY_COLUMN_NAME, query.getKey().asByteBuffer()))
        .and(gte(COLUMN_COLUMN_NAME, query.getSliceStart().asByteBuffer()))
        .and(lt(COLUMN_COLUMN_NAME, query.getSliceEnd().asByteBuffer()))
        .limit(query.getLimit()));

Is above interpolation correct?

So, if we were to model this in any RDBMS (SnowFlake for eg though SnowFlake isn't RDBMS, it is similar in terms of storage and query engine) with 3 columns as (key, value, column1) of datatypes string (varchar with binary info) can something similar query be correct?

SELECT .... FROM keyspace WHERE
key = query.getKey().asByteBuffer() and
column1 >= query.getSliceStart().asByteBuffer() and
column1 < query.getSliceEnd().asByteBuffer()
limit query.getLimit()

Does this sort of query sound similar in terms of what is targeted to achieve? If I can understand the actual meaning of the prepared statements here, I can also base my undertandings for rest of methods which would be required for doing mutations in underlaying backend.

Any help is really appreciated as we are kinda getting tighter and tighter on deadline regarding the feasibility PoC of SnowFlake as backend for JanusGraph.

Thanks in advance


On Thursday, 28 November 2019 21:05:09 UTC+5:30, Debasish Kanhar wrote:
Hi Evgeniy,

Thanks for the question. We plan to open source it once implemented but we are still long way from implementation. Will be really grateful to community who can help in any way to achieve this :-)

On Thursday, 28 November 2019 16:16:27 UTC+5:30, Evgeniy Ignatiev wrote:

Hi.

Is this backend open-source/will be open-sourced?

Best regards,
Evgeniy Ignatiev.

On 11/28/2019 1:40 PM, Debasish Kanhar wrote:
Hi Ryan.

Well that's a very valid question you asked. The current implementation of backends like Scylla as you mentioned are really highly performant. There is no specific problem in mind, but off late I have been dealing with a lot of clients who are migrating their whole system into SnowFlake, including the whole Data storage and Analytics components as well. SnowFlake is a hot upcoming Data storage and warehousing system.

Those clients are really reluctant to add another storage component to their application. Reasons can be a lot like due to high costs, or added complexity of their architecture, or duplication of data across storages. But at the same time these clients also want to incorporate Graph Databases and Graph Analytics into their application as well. This integration is targeted for those set of customers/clients who are/have migrating/migrated into SnowFlake and want to have Graph based component as well. For now, its simply not possible for them to have JanusGraph with their SnowFlake data storages.

Hope I was able to explain it clearly :-)

On Wednesday, 27 November 2019 20:40:52 UTC+5:30, Ryan Stauffer wrote:
Debasish,

This sounds like an interesting project, but I do have a question about your choice of Snowflake.  If I missed your response to this in the email chain, I apologize, but what problems with the existing high-performance backends (Scylla, for instance) are you trying to solve with Snowflake?  The answer to that would probably inform your specific implementation over Snowflake.

Thanks,
Ryan


On Wed, Nov 27, 2019 at 3:18 AM Debasish Kanhar <d...@...> wrote:
Hi Dimitriy,

Sorry about the late response. I was working on this project part time only till last week when we moved into full time dev for this PoC. Really thanks to your pointers and Jason's that we have been able to start with the development works and we have some ground work to start with :-)

So,we are modelling SnowFlake (Which is like SQL File store) as a Key-Value store by creating two columns namely "Key" and "Value" in each tables. We are going to define the data type as binary here (Or Stringified Binary) so that arbitrary data can be dumped (I feel its of type StaticBuffer Key and StaticBuffer value. Is that correct? )

Since, we are modelling SnowFlake as Key-Value store, it makes sense to have a SnowFlakeManager class implement OrderedKeyValueStore like for BerkleyJE? Is that correct understanding?

Updates are that we have almost finished development of SnowFlakeManager class. The required methods needed are implemented like beginTransaction, openDatabase though one particular function not done is mutateMany is not done, but it will be done as it in turn calls KeyValueStore.insert() method.

Also, a lot of basic functions in KeyValueStore is also done like insert (Insert binary key-value), get (Get from binary key), delete (Delete a row using binary key). We are kinda stuck at the function getSlice(). What does it do?

We are kinda wondering how getSlice operates? I know that the function is used when querying Janusgraph for gremlin queries (Read operations) (https://github.com/BillBaird/delftswa-aurelius-titan/blob/master/SA-doc/Operations.md#queries) . We see that a sliceQuery is generated which is then executed againt backend to get results.
Now, my question here is that, slice query is used while queryingfor properties for vertices (edges/properties) by slicing the relations of vertex and slicing them based on filters/conditions. The following steps are followed in getSlice function (BerkleyKeyValueStore - berkleydb & ColumnValueStore - inmemory) :
  1. Find the row from the passed key. (Returns a Binary value against the binary key)
  2. Fetch slice bounderies, i.e. slice start and end from query passed
  3. Apply the slice boundries on the returned value in 1st step else, fetch the first results (pt 1) by applying the slicing conditions in step
My question is related to last step. Since my data in DB is just "Binary Key-Binary Value", how can we apply another constraints (slice conditions) in query? It just doesn't have any additional meta data to apply slice on as I just have 2 columns in my table.

Hope my explaination was clear for you to understand. I want to know primarily how the last step would work in the data model I described above (Having 2 columns, one for Key and other for Value. And each of stringified binary data type). And, is the data model selected good enough?

Thanks in advance. And I promise this time my replies will be quicker :-)

On Friday, 25 October 2019 03:17:24 UTC+5:30, Dmitry Kovalev wrote:
Hi Debashish,

here are my 2 cents:

First of all, you need to be clear with yourself as to why exactly you want to build a new backend? E.g. do you find that the existing ones are sub-optimal for certain use cases, or they are too hard to set up, or you just want to provide a backend to a cool new database in the hope that it will increase adoption, or smth else? In other words, do you have a clear idea of what is this new backend going to provide which the existing ones do not, e.g. advanced scalability or performance or ease of setup, or just an option for people with existing Snowflake infra to put it to a new use?

Second, you are almost correct, in that basically all you need to implement are three interfaces:
- KeyColumnValueStoreManager, which allows opening multiple instances of named KeyColumnValueStores and provides a certain level of transactional context between different stores it has opened
-  KeyColumnValueStore - which represents an ordered collection of "rows" accessible by keys, where each row is a
- KeyValueStore - basically an ordered collection of key-value pairs, which can be though of as individual "columns" of that row, and their respective values

Both row and column keys, and the data values are generic byte data.

Have a look at this piece of documentation: https://docs.janusgraph.org/advanced-topics/data-model/    

Possibly the simplest way to understand the "minimum contract" required by Janusgraph from a backend is to look at the inmemory backend. You will see that:  
- KeyColumnValueStoreManager is conceptually a Map of store name ->  KeyColumnValueStore, 
- each  KeyColumnValueStore is conceptually a NavigableMap of "rows" or KeyValueStores (i.e. a "table") ,
- each KeyValueStore is conceptually an ordered collection of key -> value pairs ("columns").

In the most basic case, once you implement these three relatively simple interfaces, Janusgraph can take care of all the translation of graph operations such as adding vertices and edges, and of gremlin queries, into a series of read-write operations over a collection of KCV stores. When you open a new graph, JanusGraph asks the KeyColumnValueStoreManager implementation to create a number of specially named KeyColumnValueStores, which it uses to store vertices, edges, and various indices. It creates a number of "utility" stores which it uses internally for locking, id management etc.

Crucially, whatever stores Janusgraph creates in your backend implementation, and whatever it is using them for, you only need to make sure that you implement those basic interfaces which allow to store arbitrary byte data and access it by arbitrary byte keys.

So for your first "naive" implementation, you most probably shouldn't worry too much about translation of graph model to KCVS model and back - this is what Janusgraph itself is mostly about anyway. Just use StoreFeatures to tell Janusgraph that your backend supports only most basic operations, and concentrate on thinking how to best implement the KCVS interfaces with your underlying database/storage system.

Of course, after that, as you start thinking of supporting better levels of consistency/transaction management across multiple stores, about performance, better utilising native indexing/query mechanisms, separate indexing backends, support for distributed backend model etc etc - you will find that there is more to it, and this is where you can gain further insights from the documentation, existing backend sources and asking more specific questions.

See for example this piece of documentation: https://docs.janusgraph.org/advanced-topics/eventual-consistency/

Hope this helps,
Dmitry

On Thu, 24 Oct 2019 at 21:27, Debasish Kanhar <d...@...> wrote:
I know that JanusGraph needs a column-family type nosql database as storage backend, and hence that is why we have Scylla, Cassandra, HBase etc. SnowFlake isn't a column family database, but it has a column data type which can store any sort of data. So we can store complete JSON Oriented Column family data here after massaging / pre-processing the data. Is that a practical thought? Is is practical enough to implement?

If it is practical enough to implement, what needs to be done? I'm going through the source code, and I'm basing my ideas based on my understanding from janusgraph-cassandra and janusgraph-berkley projects. Please correct me if I'm wrong in my understanding.

  1. We need to have a StoreManager class like HBaseStoreManager, AbstractCassandraStoreManager, BerkeleyJEStoreManager which extends either DistributedStoreManager or LocalStoreManagerand implements KeyColumnValueStoreManager class right? These class needs to have build features object which is more or less like storage connection configuration. They need to have a beginTransaction method which creates the actual connection to corresponding storage backend. Is that correct?
  2. You will need to have corresponding Transaction classes which create the transaction to corresponding backend like *CassandraTransaction* or *BerkeleyJETx*. The transaction class needs to extend AbstractStoreTransaction` class. Though I can see and understand the transaction being created in BerkeleyJETx I don't see something similar for CassandraTransaction. So am I missing something in my undesrtanding here?
  3. You need to have KeyColumnValueStore class for backend. Like *AsyntaxKeyColumnValueStore* or *BerkeleyJEKeyValueStore* etc. They need to extend KeyColumnValueStore . This class takes care of massaging the data into KeyColumnFormat so that they can then be inserted into corresponding table inside Storage Backend.
    1. So question to my mind are, what will be structure of those classes?
    2. Are there some methods which needs to be present always like I see getSlice() being used across in all classes. Also, how do they work?
    3. Do they just convert incoming gremlin queries into KeyColumnValue structure?
    4. Are there any other classes I'm missing out on or these 3 are the only ones needed to be modified to create a new storage backend?
    5. Also, if these 3 are only classes needed, and let's say we success in using SnowFlake as storage backend, how do the read aspect of janusgraph/query aspect gets solved? Are there any changes needed as well on that end or JanusGraph is so abstracted that it can now start picking up from new source?
  4. And, I thought there would be some classes which would be reading in from "gremlin queries" doing certain "pre-processing into certain data structures (tabular)" and then pushed it through some connection into respective backends. This is where we cant help, is there a way to visualize those objects after "pre-processing"  and then store those objects as it is in SnowFlake and reuse it to fulfill gremlin queries.

I know we can store random objects in SnowFlake, just looking at changed needed at JanusGraph level to achieve those.

Any help will be really appreciated.

Thanks in Advance.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8169f717-9923-478d-b7f1-28d6ee894e9d%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe1118aa-5132-44ed-b59e-209e9b7adaab%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/bc498b4e-6950-46b9-b7b9-a853da174830%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8558c1df-3750-4d0d-aafb-bf14e13a0de9%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/99dff8e5-e263-40be-9c4e-7c7ecd0d2316%40googlegroups.com.
-- 
Best regards,
Evgeniy Ignatiev.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Oleksandr Porunov <alexand...@...>
 

I rarely check Gitter but I am totally OK with adding new specific channels.

My vote is +1

On Saturday, April 4, 2020 at 9:20:41 AM UTC-7, Debasish Kanhar wrote:
Makes sense. If we have enough persons interested in specific to a topic, we can create a separate channel for the same. :-)

I would say +1 for this.

If we can have Slack it will be lot better. Wider acceptance ratio I feel but anything is okay :-)

On Wednesday, 1 April 2020 12:18:39 UTC+5:30, Jan Jansen wrote:
+1 for this.

Besides Gitter, we should think also about Slack or Discord. 


  • Slack is much more common in the industry than Gitter.
  • Discord had gain speed in the open-source communities in the last years and also support Talk channels directly.

On Monday, March 30, 2020 at 12:40:23 PM UTC+2, Florian Hockmann wrote:
Hi,
we currently have a public chat on Gitter for JanusGraph that is mainly used for questions by users. I think it would be helpful to have another room to discuss about the development of JanusGraph. It should only be used for quick informal discussions as all formal decisions should be made here in the Google group where they get more visibility and people have more time to voice their opinion. But a chat makes it easier to have quick discussion or to coordinate bigger development tasks.

In addition to a general developer chat room, I think that it might also be useful in the long term to create rooms for certain areas of development, e.g., for the driver or for a specific backend. Contributors already asked for a chat to coordinate the development of the FoundationDB backend. So, I suggest that we also create a dedicated room for the FoundationDB backend which is kind of a special case as it's still in an early state, not an official backend (yet?) and there apparently already exist some attempts in different organizations to improve that backend which shows the need for improved coordination in my opinion.

Are there any concerns with this? Otherwise I'll go ahead and create the rooms.

TLDR: I suggest that we create two new rooms in our Gitter JanusGraph organization:
  1. JanusGraph Development
  2. JanusGraph FoundationDB
Regards,
Florian


Re: [VOTE] JanusGraph 0.5.1 release

Oleksandr Porunov <alexand...@...>
 

gremlin-server has been fixed. Release jars, Sonatype artifacts and git tag have been updated


On Wednesday, April 1, 2020 at 12:30:31 PM UTC-7, Oleksandr Porunov wrote:
I have submitted PR to fix the problem with default gremlin-server : https://github.com/JanusGraph/janusgraph/pull/2067

I also checked that gremlin server has the same problem with version 0.4.0. Also, it doesn't even start by default in 0.3.1 version. So, this bug is here for a long time.
I also agree on your suggestions about naming of releases. Maybe it is better to keep `janusgraph-0.5.1.zip` as a full janusgraph version and `janusgraph-0.5.1-truncated.zip` as a truncated version.

On Wednesday, April 1, 2020 at 9:54:50 AM UTC-7, Florian Hockmann wrote:
It fails for both versions. The big problem I see with this is that our docs say about this:

By default, this will instruct JanusGraph to use it's own inmemory backend instead of a dedicated database server.

so it should really not expect some externally running Cassandra or Elasticsearch. This problem was probably already present at least in 0.5.0, but I think that it gets worse with this release because we now have the truncated zip archive as the default distribution (it's named just janusgraph-[version].zip after all and not [janusgraph-[version]-truncated.zip) and that doesn't contain a method any more to easily start JanusGraph Server like it's described in the docs. For the full distribution, users could at least switch back to janusgraph.sh which should still work.
That is why I think that we should really fix that before releasing a truncated distribution or if we really want to release now, then we should treat the full archive as the default distribution and the truncated one as an addition for users who want to save some disk space and who know that they have to manage backends on their own.

Am Mittwoch, 1. April 2020 18:21:14 UTC+2 schrieb Oleksandr Porunov:
Florian, thank you for catching it. I didn't check yet but does "./bin/gremlin-server.sh start" fails on both truncated and full JanusGraph versions? I.e. I assume you should run CQL and ES externally when you are using a truncated JanusGraph version. We already have information in our documentation that "This requires to download janusgraph-full-0.5.1.zip instead of the default janusgraph-0.5.1.zip". I.e. I assume that it is OK that default ./bin/gremlin-server.sh start fails in truncated version because it is described in the documentation.
Also, if it fails in full version also, does it work in any JanusGraph version (i.e. 0.4.0 or 0.3.0)? In case it doesn't work with older JanusGraph version, then we don't introduce any new bugs and thus the bugfix can be postponed for the future releases (unless the bug can be fixed quickly or the bug is critical). That said, I didn't yet check those bugs you have described, I will try to check them later.

On Wednesday, April 1, 2020 at 7:26:46 AM UTC-7, Florian Hockmann wrote:
I just tried both zip archives, following our docs on local installation that simply starts JanusGraph Server with ./bin/gremlin-server.sh start. This fails silently which can be noticed when one tries to connect to that from the console with :remote connect as that results in this error message: gremlin-groovy is not an available GremlinScriptEngine

This seems to be caused by the fact that gremlin-server.sh uses the default gremlin-server.yaml which uses CQL and ES.

If we now point users at gremlin-server.sh to get started instead of janusgraph.sh as that isn't included in the truncated version, then we need to make sure that this actually works out of the box.

So, my VOTE is -1.

Am Samstag, 28. März 2020 23:27:49 UTC+1 schrieb Jan Jansen:
Hi,
I have tested the truncated binary distribution, using the a pre-built docker image of janusgraph.
My vote is +1.

Greetings,
Jan

On Thursday, March 26, 2020 at 2:21:41 AM UTC+1, Oleksandr Porunov wrote:
Hello,

We are happy to announce that JanusGraph 0.5.1 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.1

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-full-0.5.1.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-0.5.1.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-0.5.1-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.1

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-051-release-date-march-25-2020

This [VOTE] will open for the next 3 days --- closing Saturday, March 29, 2020 at 01:30 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov

221 - 240 of 1585