Date   

[RESULT][VOTE] JanusGraph 0.5.2 release

Oleksandr Porunov <alexand...@...>
 

This vote is now closed with a total of 3 +1s, no +0s and no -1s. The results are:

BINDING VOTES:

+1  (3 -- Oleksandr Porunov, Florian Hockmann, Jan Jansen)
0   (0)
-1  (0)

NON-BINDING VOTES:

+1 (0)
0  (0)
-1 (0)

Thank you very much,
Oleksandr Porunov


Re: [VOTE] JanusGraph 0.5.2 release

Jan Jansen <faro...@...>
 

I did a quick test of both binary distributions. VOTE +1

I think could work on releasing an extra distribution for hbase1 which is used for google bigtable, if I'm correct.


Re: [VOTE] JanusGraph 0.5.2 release

Florian Hockmann <f...@...>
 

I checked the release notes and did a quick test of both binary distributions. VOTE +1

Am Mittwoch, 6. Mai 2020 02:51:49 UTC+2 schrieb Oleksandr Porunov:

Hello,

We are happy to announce that JanusGraph 0.5.2 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.2

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-full-0.5.2.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.2

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-052-release-date-may-3-2020

This [VOTE] will open for the next 3 days --- closing Saturday, May 9, 2020 at 12:50 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov


[VOTE] JanusGraph 0.5.2 release

Oleksandr Porunov <alexand...@...>
 

Hello,

We are happy to announce that JanusGraph 0.5.2 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.2

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-full-0.5.2.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.2

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-052-release-date-may-3-2020

This [VOTE] will open for the next 3 days --- closing Saturday, May 9, 2020 at 12:50 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov


Janusgraph (Cassandra + E.S ) , OLAP BULK ingestion , issues with ES (SecondaryPersistence used to store indexes) commit and rollback funcationlaity

Ramesh Babu Y <ramesh...@...>
 

We are using jansugraph with Cassandra + ES combination , we are doing data ingestion with OLAP mode , since we are submitting batch requests , and we are using commit method to commit the transaction and their by to start actual data ingestion , here problem is first it will do cassandra commit and if their is any issue during commit janusgraph core classes are doing rolback and in same janus graph core classes , their is no rollback performed when their is issues during ES commit . below the janusgraph class where we identified the issue 

In janusgraph-core .jar , internally this is the method that gets called in StandardJanusGraph.class

If you see the below code highlighted  , even if their are errors during ES commit , nothing is re-thrown , those errors stored in MAP and after that those are printed as log statements . this leads data inconsistency , because in same code below where cassandra commit when their are any exceptions transaction is getting rollback , so that the changes are being rollback , but for ES this is not happening .

When we see the latest code for the StandardJanusGraph.class , their also no exception re-thrown , and it mentioned in the code this needs to be cleaned , does that means this is something not implemented completely ?

https://github.com/JanusGraph/janusgraph/blob/6bb2ba926b6cac2669f608f9461177d964ae0be0/janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java#L761

is this issue with janusgraph core jars ? is there any fix happened already ?

public void commit(Collection<InternalRelation> addedRelations, Collection<InternalRelation> deletedRelations, StandardJanusGraphTx tx) {
    if (!addedRelations.isEmpty() || !deletedRelations.isEmpty()) {
        log.debug("Saving transaction. Added {}, removed {}", addedRelations.size(), deletedRelations.size());
        if (!tx.getConfiguration().hasCommitTime()) {
            tx.getConfiguration().setCommitTime(this.times.getTime());
        }

        Instant txTimestamp = tx.getConfiguration().getCommitTime();
        long transactionId = this.txCounter.incrementAndGet();
        if (!tx.getConfiguration().hasAssignIDsImmediately()) {
            this.idAssigner.assignIDs(addedRelations);
        }

        BackendTransaction mutator = tx.getTxHandle();
        boolean acquireLocks = tx.getConfiguration().hasAcquireLocks();
        boolean hasTxIsolation = this.backend.getStoreFeatures().hasTxIsolation();
        boolean logTransaction = this.config.hasLogTransactions() && !tx.getConfiguration().hasEnabledBatchLoading();
        KCVSLog txLog = logTransaction ? this.backend.getSystemTxLog() : null;
        TransactionLogHeader txLogHeader = new TransactionLogHeader(transactionId, txTimestamp, this.times);

        try {
            if (logTransaction) {
                Preconditions.checkNotNull(txLog, "Transaction log is null");
                txLog.add(txLogHeader.serializeModifications(this.serializer, LogTxStatus.PRECOMMIT, tx, addedRelations, deletedRelations), txLogHeader.getLogKey());
            }

            boolean hasSchemaElements = !Iterables.isEmpty(Iterables.filter(deletedRelations, SCHEMA_FILTER)) || !Iterables.isEmpty(Iterables.filter(addedRelations, SCHEMA_FILTER));
            Preconditions.checkArgument(!hasSchemaElements || !tx.getConfiguration().hasEnabledBatchLoading() && acquireLocks, "Attempting to create schema elements in inconsistent state");
            StandardJanusGraph.ModificationSummary commitSummary;
            if (hasSchemaElements && !hasTxIsolation) {
                BackendTransaction schemaMutator = this.openBackendTransaction(tx);

                try {
                    commitSummary = this.prepareCommit(addedRelations, deletedRelations, SCHEMA_FILTER, schemaMutator, tx, acquireLocks);

                    assert commitSummary.hasModifications && !commitSummary.has2iModifications;
                } catch (Throwable var42) {
                    schemaMutator.rollback();
                    throw var42;
                }

                try {
                    schemaMutator.commit();
                } catch (Throwable var40) {
                    log.error("Could not commit transaction [" + transactionId + "] due to storage exception in system-commit", var40);
                    throw var40;
                }
            }

            commitSummary = this.prepareCommit(addedRelations, deletedRelations, hasTxIsolation ? NO_FILTER : NO_SCHEMA_FILTER, mutator, tx, acquireLocks);
            if (commitSummary.hasModifications) {
                String logTxIdentifier = tx.getConfiguration().getLogIdentifier();
                boolean hasSecondaryPersistence = logTxIdentifier != null || commitSummary.has2iModifications;
                if (logTransaction) {
                    txLog.add(txLogHeader.serializePrimary(this.serializer, hasSecondaryPersistence ? LogTxStatus.PRIMARY_SUCCESS : LogTxStatus.COMPLETE_SUCCESS), txLogHeader.getLogKey(), mutator.getTxLogPersistor());
                }

                try {
                    mutator.commitStorage();
                } catch (Throwable var39) {
                    log.error("Could not commit transaction [" + transactionId + "] due to storage exception in commit", var39);
                    throw var39;
                }

                if (hasSecondaryPersistence) {
                    LogTxStatus status = LogTxStatus.SECONDARY_SUCCESS;
                    Map<String, Throwable> indexFailures = ImmutableMap.of();
                    boolean userlogSuccess = true;

                    try {
                        indexFailures = mutator.commitIndexes();
                        if (!((Map)indexFailures).isEmpty()) {
                            status = LogTxStatus.SECONDARY_FAILURE;
                            Iterator var20 = ((Map)indexFailures).entrySet().iterator();

                            while(var20.hasNext()) {
                                java.util.Map.Entry<String, Throwable> entry = (java.util.Map.Entry)var20.next();
                                log.error("Error while committing index mutations for transaction [" + transactionId + "] on index: " + (String)entry.getKey(), (Throwable)entry.getValue());
                            }
                        }

                        if (logTxIdentifier != null) {
                            try {
                                userlogSuccess = false;
                                Log userLog = this.backend.getUserLog(logTxIdentifier);
                                Future<Message> env = userLog.add(txLogHeader.serializeModifications(this.serializer, LogTxStatus.USER_LOG, tx, addedRelations, deletedRelations));
                                if (env.isDone()) {
                                    try {
                                        env.get();
                                    } catch (ExecutionException var37) {
                                        throw var37.getCause();
                                    }
                                }

                                userlogSuccess = true;
                            } catch (Throwable var38) {
                                status = LogTxStatus.SECONDARY_FAILURE;
                                log.error("Could not user-log committed transaction [" + transactionId + "] to " + logTxIdentifier, var38);
                            }
                        }
                    } finally {
                        if (logTransaction) {
                            try {
                                txLog.add(txLogHeader.serializeSecondary(this.serializer, status, (Map)indexFailures, userlogSuccess), txLogHeader.getLogKey());
                            } catch (Throwable var36) {
                                log.error("Could not tx-log secondary persistence status on transaction [" + transactionId + "]", var36);
                            }
                        }

                    }
                } else {
                    mutator.commitIndexes();
                }
            } else {
                mutator.commit();
            }

        } catch (Throwable var43) {
            log.error("Could not commit transaction [" + transactionId + "] due to exception", var43);

            try {
                mutator.rollback();
            } catch (Throwable var35) {
                log.error("Could not roll-back transaction [" + transactionId + "] after failure due to exception", var35);
            }

            if (var43 instanceof RuntimeException) {
                throw (RuntimeException)var43;
            } else {
                throw new JanusGraphException("Unexpected exception", var43);
            }
        }
    }
}


Re: The memory can only be add() during vertex program execute

Oleksandr Porunov <alexand...@...>
 

Hi Anjani,

This is JanusGraph developers channel to discuss JanusGraph development process. You can get help with your issue in the next channel: https://groups.google.com/forum/#!forum/janusgraph-users


On Wednesday, April 29, 2020 at 10:27:53 AM UTC-7, anj...@... wrote:
Hi All,

We have a custom vertex program which find connected nodes. We have some unique requirement to remove some duplicate data, for that we are collecting all data using collectAsMap() method.
But getting below mentioned error when load is more. For low load it works fine. 


Data Store: Cassandra
Index Store: Elasticsearch

Data :  ~300M Nodes & ~371M Edges. 
Spark Cluster: 100 Executors, 7 Cores



Job aborted due to stage failure: Task 199 in stage 14.0 failed 4 times, most recent failure: Lost task 199.3 in stage 14.0 (TID 9825, <ip>, executor 97): java.lang.IllegalArgumentException: The memory can only be add() during vertex program execute: gremlin.connectedNodeVertexProgram.voteToHalt
	at org.apache.tinkerpop.gremlin.process.computer.Memory$Exceptions.memoryAddOnlyDuringVertexProgramExecute(Memory.java:161)



I am not able to figure out why this exception is thrown, i don't see any OOM error. Please share thought/pointers.


Appreciate you help.


Thanks,

Anjani




The memory can only be add() during vertex program execute

anjani...@...
 

Hi All,

We have a custom vertex program which find connected nodes. We have some unique requirement to remove some duplicate data, for that we are collecting all data using collectAsMap() method.
But getting below mentioned error when load is more. For low load it works fine. 


Data Store: Cassandra
Index Store: Elasticsearch

Data :  ~300M Nodes & ~371M Edges. 
Spark Cluster: 100 Executors, 7 Cores



Job aborted due to stage failure: Task 199 in stage 14.0 failed 4 times, most recent failure: Lost task 199.3 in stage 14.0 (TID 9825, <ip>, executor 97): java.lang.IllegalArgumentException: The memory can only be add() during vertex program execute: gremlin.connectedNodeVertexProgram.voteToHalt
	at org.apache.tinkerpop.gremlin.process.computer.Memory$Exceptions.memoryAddOnlyDuringVertexProgramExecute(Memory.java:161)



I am not able to figure out why this exception is thrown, i don't see any OOM error. Please share thought/pointers.


Appreciate you help.


Thanks,

Anjani




Re: [DISCUSS] Developer chat (+FoundationDB chat)

Henry Saputra <henry....@...>
 

Thanks for driving this, Florian


On Thu, Apr 16, 2020 at 6:25 AM Florian Hockmann <f...@...> wrote:
You should all be able to join. I'll also create a PR to add it to our README.md.

Am Mittwoch, 15. April 2020 14:16:36 UTC+2 schrieb Florian Hockmann:
It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:
Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/45ef5fe9-7ea8-4757-9018-2add4e573bce%40googlegroups.com.


Re: Batching Queries to backend for faster performance

Debasish Kanhar <d.k...@...>
 

Thanks Pavel.

I've tried all those, and those have helped me reduce the execution time by almost 50% along with query optimizations as well. But that's still slower for my use case. :-)

I'm looking at more and more batching options wherever I can.

You can check out my discussion on Gremlin-users : https://groups.google.com/forum/#!topic/gremlin-users/RaIHVbDE5rk for more clarity on requirement and details and any help to give ideas on how to implement will be of great help :-)

On Thursday, 16 April 2020 21:20:40 UTC+5:30, Pavel Ershov wrote:

JG has three options to reduce number of queries https://docs.janusgraph.org/basics/configuration-reference/#query

PROPERTY_PREFETCHING -- enabled by default
USE_MULTIQUERY -- disabled by default
BATCH_PROPERTY_PREFETCHING -- disabled by default

Last two needs to implement OrderedKeyValueStore.getSlices and enable multiQuery feature by store



среда, 8 апреля 2020 г., 19:10:38 UTC+3 пользователь Debasish Kanhar написал:
For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



Re: Batching Queries to backend for faster performance

Pavel Ershov <owner...@...>
 


JG has three options to reduce number of queries https://docs.janusgraph.org/basics/configuration-reference/#query

PROPERTY_PREFETCHING -- enabled by default
USE_MULTIQUERY -- disabled by default
BATCH_PROPERTY_PREFETCHING -- disabled by default

Last two needs to implement OrderedKeyValueStore.getSlices and enable multiQuery feature by store



среда, 8 апреля 2020 г., 19:10:38 UTC+3 пользователь Debasish Kanhar написал:

For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



Re: [DISCUSS] Developer chat (+FoundationDB chat)

Florian Hockmann <f...@...>
 

I created the channel: https://gitter.im/janusgraph/janusgraph-dev
You should all be able to join. I'll also create a PR to add it to our README.md.

Am Mittwoch, 15. April 2020 14:16:36 UTC+2 schrieb Florian Hockmann:

It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:
Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Florian Hockmann <f...@...>
 

It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:

Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Henry Saputra <henry....@...>
 

Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Florian Hockmann <f...@...>
 

I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:



On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <janusgr...@googlegroups.com> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <janusgr...@googlegroups.com> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.


Re: Batching Queries to backend for faster performance

Debasish Kanhar <d.k...@...>
 

For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



[RESULT][VOTE] JanusGraph 0.5.1 release

Oleksandr Porunov <alexand...@...>
 

This vote is now closed with a total of 4 +1s, no +0s and no -1s. The results are:

BINDING VOTES:

+1  (3 -- Oleksandr Porunov, Jan Jansen, Florian Hockmann)
0   (0)
-1  (0)

NON-BINDING VOTES:

+1 (1 -- Nicolas)
0  (0)
-1 (0)

Thank you very much,
Oleksandr Porunov


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Jan Jansen <faro...@...>
 



On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <janus...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <janusgr...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Misha Brukman <mbru...@...>
 

On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <janusgr...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 


Re: [DISCUSS] Developer chat (+FoundationDB chat)

Jan Jansen <faro...@...>
 

Hi Misra,
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

I didn’t know history thing in slack. I mentioned also Discord which also includes voice channels.

Greetings,
Jan

On 7. Apr 2020, at 16:00, 'Misha Brukman' via JanusGraph developers <janus...@...> wrote:


FWIW, when we originally started JanusGraph, Henry Saputra (cc'd) lead an extensive evaluation of various chat options and we settled on Gitter, primarily because:
  • unlike Slack, Gitter doesn't require running a bot to have people join a workspace: by default, Slack is closed to sign-ups; you can whitelist some domains, but you can't make it "public" so you end up having to run a bot like SlackinSlackEngine or similar (with appropriate credentials) just so that folks can join the chat
  • you have to create another account just for Slack — with Gitter, you can just trivially sign-in via OAuth, using your existing GitHub or Twitter account
  • on the free plan, Slack limits how much history you can see, which decreases visibility for folks who are not signed in 24/7; Gitter provides full free access to the entire history
There may have been other issues which I'm forgetting; Henry, please add what I missed.

Overall, I would recommend that we not splinter the chat rooms across more than 1 service: since we're already using Gitter for chat, can't we just add another room there for FoundationDB or any other topic? It also has good support for Markdown and code formatting; I'm not sure what exactly we're missing from Slack with Gitter.

Misha

On Tue, Apr 7, 2020 at 3:52 AM Debasish Kanhar <d.k...@...> wrote:
Sounds good Jan.

Let's go with the option where we have maximum developers/committers involved/or use it. So that the query posters / users can reach the maximum of our community :-)

On Tuesday, 7 April 2020 12:03:33 UTC+5:30, Jan Jansen wrote:
Hi

We have three different chat options. If no one wants to add something, i will open up a vote to decide which platform we want to use. (This will happen tomorrow.)

Here the options again:
  • Gitter
  • Slack
  • Discord

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/988986cf-6bcf-4e85-8beb-6867e2481ec5%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oO%2B-g_rUg8_CRu3ywa8r6vhz4UFdJs_8NGz3wCqudUGkQ%40mail.gmail.com.


Re: [VOTE] JanusGraph 0.5.1 release

Florian Hockmann <f...@...>
 

So, we keep this VOTE thread for this release?

In that case: I tested the new distributions by starting JanusGraph Server with the in-memory backend on both as described in the docs and then connected successfully with the Gremlin Console via remote to that. For the full distribution, I also tested the same with bin/janusgraph.sh. Everything worked as expected.

I therefore change my VOTE to +1.

Am Samstag, 4. April 2020 19:23:58 UTC+2 schrieb Oleksandr Porunov:

gremlin-server has been fixed. Release jars, Sonatype artifacts and git tag have been updated

On Wednesday, April 1, 2020 at 12:30:31 PM UTC-7, Oleksandr Porunov wrote:
I have submitted PR to fix the problem with default gremlin-server : https://github.com/JanusGraph/janusgraph/pull/2067

I also checked that gremlin server has the same problem with version 0.4.0. Also, it doesn't even start by default in 0.3.1 version. So, this bug is here for a long time.
I also agree on your suggestions about naming of releases. Maybe it is better to keep `janusgraph-0.5.1.zip` as a full janusgraph version and `janusgraph-0.5.1-truncated.zip` as a truncated version.

On Wednesday, April 1, 2020 at 9:54:50 AM UTC-7, Florian Hockmann wrote:
It fails for both versions. The big problem I see with this is that our docs say about this:

By default, this will instruct JanusGraph to use it's own inmemory backend instead of a dedicated database server.

so it should really not expect some externally running Cassandra or Elasticsearch. This problem was probably already present at least in 0.5.0, but I think that it gets worse with this release because we now have the truncated zip archive as the default distribution (it's named just janusgraph-[version].zip after all and not [janusgraph-[version]-truncated.zip) and that doesn't contain a method any more to easily start JanusGraph Server like it's described in the docs. For the full distribution, users could at least switch back to janusgraph.sh which should still work.
That is why I think that we should really fix that before releasing a truncated distribution or if we really want to release now, then we should treat the full archive as the default distribution and the truncated one as an addition for users who want to save some disk space and who know that they have to manage backends on their own.

Am Mittwoch, 1. April 2020 18:21:14 UTC+2 schrieb Oleksandr Porunov:
Florian, thank you for catching it. I didn't check yet but does "./bin/gremlin-server.sh start" fails on both truncated and full JanusGraph versions? I.e. I assume you should run CQL and ES externally when you are using a truncated JanusGraph version. We already have information in our documentation that "This requires to download janusgraph-full-0.5.1.zip instead of the default janusgraph-0.5.1.zip". I.e. I assume that it is OK that default ./bin/gremlin-server.sh start fails in truncated version because it is described in the documentation.
Also, if it fails in full version also, does it work in any JanusGraph version (i.e. 0.4.0 or 0.3.0)? In case it doesn't work with older JanusGraph version, then we don't introduce any new bugs and thus the bugfix can be postponed for the future releases (unless the bug can be fixed quickly or the bug is critical). That said, I didn't yet check those bugs you have described, I will try to check them later.

On Wednesday, April 1, 2020 at 7:26:46 AM UTC-7, Florian Hockmann wrote:
I just tried both zip archives, following our docs on local installation that simply starts JanusGraph Server with ./bin/gremlin-server.sh start. This fails silently which can be noticed when one tries to connect to that from the console with :remote connect as that results in this error message: gremlin-groovy is not an available GremlinScriptEngine

This seems to be caused by the fact that gremlin-server.sh uses the default gremlin-server.yaml which uses CQL and ES.

If we now point users at gremlin-server.sh to get started instead of janusgraph.sh as that isn't included in the truncated version, then we need to make sure that this actually works out of the box.

So, my VOTE is -1.

Am Samstag, 28. März 2020 23:27:49 UTC+1 schrieb Jan Jansen:
Hi,
I have tested the truncated binary distribution, using the a pre-built docker image of janusgraph.
My vote is +1.

Greetings,
Jan

On Thursday, March 26, 2020 at 2:21:41 AM UTC+1, Oleksandr Porunov wrote:
Hello,

We are happy to announce that JanusGraph 0.5.1 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.1

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-full-0.5.1.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-0.5.1.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.1/janusgraph-0.5.1-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.1

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-051-release-date-march-25-2020

This [VOTE] will open for the next 3 days --- closing Saturday, March 29, 2020 at 01:30 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov