Date   

Re: Official support for FoundationDB?

f.gri...@...
 

I'm of course interested and would like to offer my support for everything that needs to be done.

-- Florian

rngcntr -- Sorry, I didn't see your first name... are you interested as well?


Re: Official support for FoundationDB?

Jason Plurad <plu...@...>
 

Sounds like there is some interest here from Chris (blindenvy) and Yuvaraj (skyrocknroll) on a common fork.

rngcntr -- Sorry, I didn't see your first name... are you interested as well?

I'd think the next steps would be to get a vote approved first, then we'd need a repo to seed to the project. There are a few options there on how to proceed with that, but let's get the above answered first before moving to a vote.

-- Jason

On Friday, June 5, 2020 at 9:51:01 PM UTC-4, Yuvaraj Loganathan wrote:
We would be really interested in a common fork.


Janusgraph functionality issue

shivain...@...
 

Hi,

I have noticed a corruption in index after reindexing vertex-centric indices with direction 'IN'.

Issue : A self-link is getting added to every vertex after reindexing.

To demonstrate this, I have taken a very small graph consists of two vertices (A & B) and one edge connecting the two.
Edge Label -> link
Edge has one property key (Integer) -> assocKind

//Creating the Vertex Centrix Index
gremlin > edgeLabel = mgmt.getEdgeLabel("link");
gremlin > assocKind= mgmt.getPropertyKey("assocKind")
gremlin > mgmt.buildEdgeIndex(edgeLabel, "myVertexCentricIndex", Direction.IN, assocKind);

Please note i have given the direction IN for the index.


//Creating the Edge from A to B: (a and b are vertices)
a.addEdge("link",b,"assocKind",1)

Output Before Reindexing :
gremlin> g.V().has('name' , 'A').inE().hasLabel('link').has('assocKind',1)
//no IN edges to vertex A  Correct

gremlin> g.V().has('name' , 'A').outE().hasLabel('link').has('assocKind',1)
==>e[4e1f-b6g-1bit-1pqw][14488-link->80024]   Correct

Now I ran the reindex command
// 'index' is the vertex centric index which is created above
gremlin> m.updateIndex(index, SchemaAction.REINDEX).get()

Output After Reindexing :
gremlin> g.V().has('_objId','A').inE().hasLabel('link').has('assocKind',1)
==>e[4e1f-b6g-1bit-b6g][14488-link->14488] Wrong

gremlin> g.V().has('_objId','A').bothE().hasLabel('link').has('assocKind',1)
==>e[4e1f-b6g-1bit-1pqw][14488-link->80024]
==>e[4e1f-b6g-1bit-b6g][14488-link->14488] Unexpected Link

This issue happens in janusgraph 0.5.2 also.

Thanks
Shiva


Edge Filter in Janusgraph when working with Spark

rafi ansari <rafi1...@...>
 

Hi All,

I am currently working on using Janusgraph in batch mode using Spark.

I am facing a problem on filtering the edges by label.

Below are the specifications:
Spark = 2.4.5
Janusgraph = 0.5.0

Below is the configuration file for Spark:

conf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph")
conf.setProperty("gremlin.hadoop.graphReader", "org.janusgraph.hadoop.formats.cql.CqlInputFormat")
conf.setProperty("gremlin.hadoop.graphWriter", "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat")
conf.setProperty("spark.cassandra.connection.host", "127.0.0.1")
conf.setProperty("janusgraphmr.ioformat.conf.storage.backend", "cql")
conf.setProperty("janusgraphmr.ioformat.conf.storage.hostname", "127.0.0.1")
conf.setProperty("janusgraphmr.ioformat.conf.storage.port", 9042)
conf.setProperty("janusgraphmr.ioformat.conf.storage.cql.keyspace", "graph_db_1")
conf.setProperty("janusgraphmr.ioformat.conf.index.search.backend", "elasticsearch")
conf.setProperty("janusgraphmr.ioformat.conf.index.search.hostname", "127.0.0.1")
conf.setProperty("janusgraphmr.ioformat.conf.index.search.port", 9200)
conf.setProperty("janusgraphmr.ioformat.conf.index.search.index-name", "graph_1")
conf.setProperty("cassandra.input.partitioner.class","org.apache.cassandra.dht.Murmur3Partitioner")
conf.setProperty("cassandra.input.widerows",true)
conf.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.setProperty("spark.kryo.registrator", "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator")

Below is the Spark code using newAPIHadoopRDD 


val hadoopConfiguration = ConfUtil.makeHadoopConfiguration(conf)

val rdd: RDD[(NullWritable, VertexWritable)] =spark.sparkContext.newAPIHadoopRDD(hadoopConfiguration,hadoopConfiguration.
getClass(Constants.GREMLIN_HADOOP_GRAPH_READER, classOf[InputFormat[NullWritable, VertexWritable]]).
asInstanceOf[Class[InputFormat[NullWritable, VertexWritable]]],classOf[NullWritable], classOf[VertexWritable])

The above lines give an RDD as output.

rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.NullWritable, org.apache.tinkerpop.gremlin.hadoop.structure.io.VertexWritable)]

rdd.map{case (x,y)=>y.asInstanceOf[VertexWritable]}

res17: Array[String] = Array(v[8344], v[12440], v[4336], v[4320], v[4136], v[8416], v[8192], v[4248], v[4344], v[8432], v[12528], v[4096])

From the res17 above, not sure how to filter the edges by labels


TIA

Regards

Rafi


Re: Official support for FoundationDB?

Yuvaraj Loganathan <uva...@...>
 

We would be really interested in a common fork.


Re: Official support for FoundationDB?

Jason Plurad <plu...@...>
 

+1. I support having a janusgraph-foundationdb repository under the main JanusGraph org on GitHub. There is precedent for this with janusgraph-docker, language-specific client bindings (janusgraph-dotnet and janusgraph-python), and janusgraph-ambari.

It comes down to whether there is a consensus desire to collaborate together instead of having separate forks. There would be some work from those fork authors to agree on how to merge those forks into the one that will serve as the initial repo. There would be more restrictions on getting code committed since the project would be expected to have a code review and approval process in place, like the other JanusGraph repos. Keep in mind that having a repo under the JanusGraph org doesn't mean that it needs to be production worthy from the start.

Thanks to both of you for getting the ball rolling here with FoundationDB. Happy to hear more feedback/discussion from the community.

-- Jason

On Saturday, May 30, 2020 at 5:57:13 PM UTC-4, Christopher Jackson wrote:
I agree we would have to clearly state all limitations the adapter has so everyone is aware of them. 

Would love to hear additional feedback from the community on the topic of making this adapter official. 

On Monday, May 25, 2020 at 3:21:22 AM UTC-4, f....@... wrote:
Hi Christopher,
thanks for offering your support!

I did indeed put a lot of effort into my fork (rngcntr), but I worked alone and rewrote a significant amount of code so it definitely lacks an in-depth review. Other than that, most of what I did was stress testing and finding bugs that need to be fixed. There are stil some open issues but all in all I'm quite happy with the current state of the storage adapter.

What still worries me is the effects of the transaction time limit of FDB. Officially supporting FDB as a storage backend would currently leave the user in charge of handling these problems. Retrying can not solve all problems as some queries will never make it through the 5s limit, even when tried over and over again. This results in the FDB adapter only supporting a subset of all possible queries and this should be stated clearly in case it is officially released.


Re: Official support for FoundationDB?

Christopher Jackson <jackson.ch...@...>
 

I agree we would have to clearly state all limitations the adapter has so everyone is aware of them. 

Would love to hear additional feedback from the community on the topic of making this adapter official. 


On Monday, May 25, 2020 at 3:21:22 AM UTC-4, f....@... wrote:
Hi Christopher,
thanks for offering your support!

I did indeed put a lot of effort into my fork (rngcntr), but I worked alone and rewrote a significant amount of code so it definitely lacks an in-depth review. Other than that, most of what I did was stress testing and finding bugs that need to be fixed. There are stil some open issues but all in all I'm quite happy with the current state of the storage adapter.

What still worries me is the effects of the transaction time limit of FDB. Officially supporting FDB as a storage backend would currently leave the user in charge of handling these problems. Retrying can not solve all problems as some queries will never make it through the 5s limit, even when tried over and over again. This results in the FDB adapter only supporting a subset of all possible queries and this should be stated clearly in case it is officially released.


Re: Official support for FoundationDB?

f.gri...@...
 

Hi Christopher,
thanks for offering your support!

I did indeed put a lot of effort into my fork (rngcntr), but I worked alone and rewrote a significant amount of code so it definitely lacks an in-depth review. Other than that, most of what I did was stress testing and finding bugs that need to be fixed. There are stil some open issues but all in all I'm quite happy with the current state of the storage adapter.

What still worries me is the effects of the transaction time limit of FDB. Officially supporting FDB as a storage backend would currently leave the user in charge of handling these problems. Retrying can not solve all problems as some queries will never make it through the 5s limit, even when tried over and over again. This results in the FDB adapter only supporting a subset of all possible queries and this should be stated clearly in case it is officially released.


Official support for FoundationDB?

jackson.ch...@...
 

Hi everyone,

Seems there may be a good deal of interest in having support for FoundationDB as a backend storage for Janusgraph. Ted Wilmes had created a storage adapter https://github.com/experoinc/janusgraph-foundationdb that hasn't seen any updates by Ted for some time, however other teams have picked up where Ted left off with a few forks seeing some activity:

https://github.com/rngcntr/janusgraph-foundationdb
https://github.com/skyrocknroll/janusgraph-foundationdb
https://github.com/blindenvy/janusgraph-foundationdb

I wanted to start a conversation about working towards getting this adapter added as part of the main janusgraph code base and making FDB an officially supported backend.

Personally, I am interested in seeing this happen as I work for IBM and we are planning on using Janusgraph on top of FoundationDB for a few projects. The https://github.com/blindenvy/janusgraph-foundationdb fork is where some of our team members have started to make changes. We originally forked from experoinc repo and only noticed the other forks after the fact. It seems both the rngcntr and skyrocknroll forks have seen a good deal of changes and may be better forks to choose as a starting point to bring in to the official code base.  

If the community decides they want to do this I would like to offer my assistance in helping bring the code in and adding documentation to jump start the process. 

I look forward to hearing everyones thoughts on this topic. 

Regards,
Christopher Jackson


Read or write in JanusGraph with Spark GraphFrames

rafi ansari <rafi1...@...>
 

Hi all,

Is there any documentation on how to connect Spark GraphFrame to JanusGraph for BULK read/write operations?

JanusGraph docs (https://docs.janusgraph.org/advanced-topics/hadoop/) shows how to do OLAP on Gremlin, but I could not find any documentation or source doing OLAP using Spark GraphFrame.

Thanks

Rafi


[RESULT][VOTE] JanusGraph 0.5.2 release

Oleksandr Porunov <alexand...@...>
 

This vote is now closed with a total of 3 +1s, no +0s and no -1s. The results are:

BINDING VOTES:

+1  (3 -- Oleksandr Porunov, Florian Hockmann, Jan Jansen)
0   (0)
-1  (0)

NON-BINDING VOTES:

+1 (0)
0  (0)
-1 (0)

Thank you very much,
Oleksandr Porunov


Re: [VOTE] JanusGraph 0.5.2 release

Jan Jansen <faro...@...>
 

I did a quick test of both binary distributions. VOTE +1

I think could work on releasing an extra distribution for hbase1 which is used for google bigtable, if I'm correct.


Re: [VOTE] JanusGraph 0.5.2 release

Florian Hockmann <f...@...>
 

I checked the release notes and did a quick test of both binary distributions. VOTE +1

Am Mittwoch, 6. Mai 2020 02:51:49 UTC+2 schrieb Oleksandr Porunov:

Hello,

We are happy to announce that JanusGraph 0.5.2 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.2

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-full-0.5.2.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.2

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-052-release-date-may-3-2020

This [VOTE] will open for the next 3 days --- closing Saturday, May 9, 2020 at 12:50 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov


[VOTE] JanusGraph 0.5.2 release

Oleksandr Porunov <alexand...@...>
 

Hello,

We are happy to announce that JanusGraph 0.5.2 is ready for release.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.5.2

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-full-0.5.2.zip

A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2.zip

The GPG key used to sign the release artifacts is available at:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/KEYS

The docs can be found here:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.5.2/janusgraph-0.5.2-doc.zip

The release tag in Git can be found here:
        https://github.com/JanusGraph/janusgraph/tree/v0.5.2

The release notes are available here:
        https://github.com/JanusGraph/janusgraph/blob/v0.5/docs/changelog.md#version-052-release-date-may-3-2020

This [VOTE] will open for the next 3 days --- closing Saturday, May 9, 2020 at 12:50 AM GMT.
All are welcome to review and vote on the release, but only votes from TSC members are binding.
My vote is +1.

Thank you,
Oleksandr Porunov


Janusgraph (Cassandra + E.S ) , OLAP BULK ingestion , issues with ES (SecondaryPersistence used to store indexes) commit and rollback funcationlaity

Ramesh Babu Y <ramesh...@...>
 

We are using jansugraph with Cassandra + ES combination , we are doing data ingestion with OLAP mode , since we are submitting batch requests , and we are using commit method to commit the transaction and their by to start actual data ingestion , here problem is first it will do cassandra commit and if their is any issue during commit janusgraph core classes are doing rolback and in same janus graph core classes , their is no rollback performed when their is issues during ES commit . below the janusgraph class where we identified the issue 

In janusgraph-core .jar , internally this is the method that gets called in StandardJanusGraph.class

If you see the below code highlighted  , even if their are errors during ES commit , nothing is re-thrown , those errors stored in MAP and after that those are printed as log statements . this leads data inconsistency , because in same code below where cassandra commit when their are any exceptions transaction is getting rollback , so that the changes are being rollback , but for ES this is not happening .

When we see the latest code for the StandardJanusGraph.class , their also no exception re-thrown , and it mentioned in the code this needs to be cleaned , does that means this is something not implemented completely ?

https://github.com/JanusGraph/janusgraph/blob/6bb2ba926b6cac2669f608f9461177d964ae0be0/janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java#L761

is this issue with janusgraph core jars ? is there any fix happened already ?

public void commit(Collection<InternalRelation> addedRelations, Collection<InternalRelation> deletedRelations, StandardJanusGraphTx tx) {
    if (!addedRelations.isEmpty() || !deletedRelations.isEmpty()) {
        log.debug("Saving transaction. Added {}, removed {}", addedRelations.size(), deletedRelations.size());
        if (!tx.getConfiguration().hasCommitTime()) {
            tx.getConfiguration().setCommitTime(this.times.getTime());
        }

        Instant txTimestamp = tx.getConfiguration().getCommitTime();
        long transactionId = this.txCounter.incrementAndGet();
        if (!tx.getConfiguration().hasAssignIDsImmediately()) {
            this.idAssigner.assignIDs(addedRelations);
        }

        BackendTransaction mutator = tx.getTxHandle();
        boolean acquireLocks = tx.getConfiguration().hasAcquireLocks();
        boolean hasTxIsolation = this.backend.getStoreFeatures().hasTxIsolation();
        boolean logTransaction = this.config.hasLogTransactions() && !tx.getConfiguration().hasEnabledBatchLoading();
        KCVSLog txLog = logTransaction ? this.backend.getSystemTxLog() : null;
        TransactionLogHeader txLogHeader = new TransactionLogHeader(transactionId, txTimestamp, this.times);

        try {
            if (logTransaction) {
                Preconditions.checkNotNull(txLog, "Transaction log is null");
                txLog.add(txLogHeader.serializeModifications(this.serializer, LogTxStatus.PRECOMMIT, tx, addedRelations, deletedRelations), txLogHeader.getLogKey());
            }

            boolean hasSchemaElements = !Iterables.isEmpty(Iterables.filter(deletedRelations, SCHEMA_FILTER)) || !Iterables.isEmpty(Iterables.filter(addedRelations, SCHEMA_FILTER));
            Preconditions.checkArgument(!hasSchemaElements || !tx.getConfiguration().hasEnabledBatchLoading() && acquireLocks, "Attempting to create schema elements in inconsistent state");
            StandardJanusGraph.ModificationSummary commitSummary;
            if (hasSchemaElements && !hasTxIsolation) {
                BackendTransaction schemaMutator = this.openBackendTransaction(tx);

                try {
                    commitSummary = this.prepareCommit(addedRelations, deletedRelations, SCHEMA_FILTER, schemaMutator, tx, acquireLocks);

                    assert commitSummary.hasModifications && !commitSummary.has2iModifications;
                } catch (Throwable var42) {
                    schemaMutator.rollback();
                    throw var42;
                }

                try {
                    schemaMutator.commit();
                } catch (Throwable var40) {
                    log.error("Could not commit transaction [" + transactionId + "] due to storage exception in system-commit", var40);
                    throw var40;
                }
            }

            commitSummary = this.prepareCommit(addedRelations, deletedRelations, hasTxIsolation ? NO_FILTER : NO_SCHEMA_FILTER, mutator, tx, acquireLocks);
            if (commitSummary.hasModifications) {
                String logTxIdentifier = tx.getConfiguration().getLogIdentifier();
                boolean hasSecondaryPersistence = logTxIdentifier != null || commitSummary.has2iModifications;
                if (logTransaction) {
                    txLog.add(txLogHeader.serializePrimary(this.serializer, hasSecondaryPersistence ? LogTxStatus.PRIMARY_SUCCESS : LogTxStatus.COMPLETE_SUCCESS), txLogHeader.getLogKey(), mutator.getTxLogPersistor());
                }

                try {
                    mutator.commitStorage();
                } catch (Throwable var39) {
                    log.error("Could not commit transaction [" + transactionId + "] due to storage exception in commit", var39);
                    throw var39;
                }

                if (hasSecondaryPersistence) {
                    LogTxStatus status = LogTxStatus.SECONDARY_SUCCESS;
                    Map<String, Throwable> indexFailures = ImmutableMap.of();
                    boolean userlogSuccess = true;

                    try {
                        indexFailures = mutator.commitIndexes();
                        if (!((Map)indexFailures).isEmpty()) {
                            status = LogTxStatus.SECONDARY_FAILURE;
                            Iterator var20 = ((Map)indexFailures).entrySet().iterator();

                            while(var20.hasNext()) {
                                java.util.Map.Entry<String, Throwable> entry = (java.util.Map.Entry)var20.next();
                                log.error("Error while committing index mutations for transaction [" + transactionId + "] on index: " + (String)entry.getKey(), (Throwable)entry.getValue());
                            }
                        }

                        if (logTxIdentifier != null) {
                            try {
                                userlogSuccess = false;
                                Log userLog = this.backend.getUserLog(logTxIdentifier);
                                Future<Message> env = userLog.add(txLogHeader.serializeModifications(this.serializer, LogTxStatus.USER_LOG, tx, addedRelations, deletedRelations));
                                if (env.isDone()) {
                                    try {
                                        env.get();
                                    } catch (ExecutionException var37) {
                                        throw var37.getCause();
                                    }
                                }

                                userlogSuccess = true;
                            } catch (Throwable var38) {
                                status = LogTxStatus.SECONDARY_FAILURE;
                                log.error("Could not user-log committed transaction [" + transactionId + "] to " + logTxIdentifier, var38);
                            }
                        }
                    } finally {
                        if (logTransaction) {
                            try {
                                txLog.add(txLogHeader.serializeSecondary(this.serializer, status, (Map)indexFailures, userlogSuccess), txLogHeader.getLogKey());
                            } catch (Throwable var36) {
                                log.error("Could not tx-log secondary persistence status on transaction [" + transactionId + "]", var36);
                            }
                        }

                    }
                } else {
                    mutator.commitIndexes();
                }
            } else {
                mutator.commit();
            }

        } catch (Throwable var43) {
            log.error("Could not commit transaction [" + transactionId + "] due to exception", var43);

            try {
                mutator.rollback();
            } catch (Throwable var35) {
                log.error("Could not roll-back transaction [" + transactionId + "] after failure due to exception", var35);
            }

            if (var43 instanceof RuntimeException) {
                throw (RuntimeException)var43;
            } else {
                throw new JanusGraphException("Unexpected exception", var43);
            }
        }
    }
}


Re: The memory can only be add() during vertex program execute

Oleksandr Porunov <alexand...@...>
 

Hi Anjani,

This is JanusGraph developers channel to discuss JanusGraph development process. You can get help with your issue in the next channel: https://groups.google.com/forum/#!forum/janusgraph-users


On Wednesday, April 29, 2020 at 10:27:53 AM UTC-7, anj...@... wrote:
Hi All,

We have a custom vertex program which find connected nodes. We have some unique requirement to remove some duplicate data, for that we are collecting all data using collectAsMap() method.
But getting below mentioned error when load is more. For low load it works fine. 


Data Store: Cassandra
Index Store: Elasticsearch

Data :  ~300M Nodes & ~371M Edges. 
Spark Cluster: 100 Executors, 7 Cores



Job aborted due to stage failure: Task 199 in stage 14.0 failed 4 times, most recent failure: Lost task 199.3 in stage 14.0 (TID 9825, <ip>, executor 97): java.lang.IllegalArgumentException: The memory can only be add() during vertex program execute: gremlin.connectedNodeVertexProgram.voteToHalt
	at org.apache.tinkerpop.gremlin.process.computer.Memory$Exceptions.memoryAddOnlyDuringVertexProgramExecute(Memory.java:161)



I am not able to figure out why this exception is thrown, i don't see any OOM error. Please share thought/pointers.


Appreciate you help.


Thanks,

Anjani




The memory can only be add() during vertex program execute

anjani...@...
 

Hi All,

We have a custom vertex program which find connected nodes. We have some unique requirement to remove some duplicate data, for that we are collecting all data using collectAsMap() method.
But getting below mentioned error when load is more. For low load it works fine. 


Data Store: Cassandra
Index Store: Elasticsearch

Data :  ~300M Nodes & ~371M Edges. 
Spark Cluster: 100 Executors, 7 Cores



Job aborted due to stage failure: Task 199 in stage 14.0 failed 4 times, most recent failure: Lost task 199.3 in stage 14.0 (TID 9825, <ip>, executor 97): java.lang.IllegalArgumentException: The memory can only be add() during vertex program execute: gremlin.connectedNodeVertexProgram.voteToHalt
	at org.apache.tinkerpop.gremlin.process.computer.Memory$Exceptions.memoryAddOnlyDuringVertexProgramExecute(Memory.java:161)



I am not able to figure out why this exception is thrown, i don't see any OOM error. Please share thought/pointers.


Appreciate you help.


Thanks,

Anjani




Re: [DISCUSS] Developer chat (+FoundationDB chat)

Henry Saputra <henry....@...>
 

Thanks for driving this, Florian


On Thu, Apr 16, 2020 at 6:25 AM Florian Hockmann <f...@...> wrote:
You should all be able to join. I'll also create a PR to add it to our README.md.

Am Mittwoch, 15. April 2020 14:16:36 UTC+2 schrieb Florian Hockmann:
It looks like we have a wide consensus on starting a new channel for development discussions, but different opinions on whether we should directly create a dedicated channel for FoundationDB and also on whether we should switch to a different system than Gitter.
So, I will create a new channel janusgraph-dev on Gitter and we can then see whether we need dedicated channels for, e.g., FoundationDB. If contributors have a strong opinion on moving to a different system, then please start a different thread for that so we can discuss it in general as Henry also already suggested. Since we didn't reach a consensus on this topic here, I don't want to let it stop the creation of the dev channel.

If that sounds OK to everyone, I will create the janusgraph-dev channel tomorrow on Gitter.

Am Dienstag, 14. April 2020 23:52:58 UTC+2 schrieb Henry Saputra:
Sorry I just saw this discussions. Thanks for pinging me, Misha.

As Misha and Florian had mentioned, we did some investigating and exploring which "chat" tool we will use for JanusGraph.
We chose Gitter due to the low barrier and maintenance to join and start discussions compare to Slack.

The discussion that Florian start with was about new channel for Development discussions in Gitter for JanusGraph, which I think it is great idea, so +1 for it.
This will allow real-time mechanism in addition to our mailing list to talk about development ideas and progress.
As for FoundationDb, I think we could just use Dev channel to discuss about it instead of dedicated channel for it.

We could discuss moving or embrace other chat tool like Slack and Discord in other thread as separate topic.

Thanks,

- Henry


On Tue, Apr 14, 2020 at 4:19 AM Florian Hockmann <f...@...> wrote:
I think Misha has good arguments for staying on Gitter. I personally don't think that we need a high entry barrier for a developer chat as we currently also don't get many non-dev questions in the Google group for developers. People instead ask in janusgraph-users (if they ask in a Google group that is) and I don't see why that should be different on Gitter as they can still ask their usage questions in the main Gitter chat that is already quite active.
We should also try to stay as open as possible with our development discussions to not exclude some people in my opinion. If we notice that we still get too many usage questions in a dev chat, then we can still create a private chat room in Gitter. (It looks at least from a quick search that it's possible to do that with Gitter.)

The other main argument against Gitter and pro Discord I see would be the voice chat, but I'm not sure how important that actually is for us.

Another topic that was discussed here was whether we actually need a dedicated channel just for FoundationDB. I don't have a strong opinion either way, but I suggest that we then simply start with a general dev channel and see whether we have a lot of FoundationDB specific discussions there and then we can still create a dedicated channel.

Am Dienstag, 7. April 2020 18:34:24 UTC+2 schrieb Jan Jansen:


On 7. Apr 2020, at 18:09, 'Misha Brukman' via JanusGraph developers <jan...@...> wrote:


On Tue, Apr 7, 2020 at 11:36 AM 'Jan Jansen' via JanusGraph developers <jan...@...> wrote:
Some of your reason are exactly the reason why I’m against Gitter. 
Currently Gitter is for JanusGraph just support chat. This channel would be hard to use for the developer focused talks. Therefore, we have created private channel or highly moderate channel just for development focused topics. I think having an extra platform where you have to signup separately would reduce the support questions, massively. 

Are you trying to have the same separation on Gitter as we do with janusgraph-users@ and janusgraph-dev@ mailing lists? Would creating separate channels on Gitter not address this? Or are you saying you've already done this, and it's not helping?
I thought about the separation of google groups. I think just a different channel of wouldn’t help because some people just ignore these rules. For example github issues, you have a template for question which says please ask in the google group and they ignore it. In the google group, you see it less often. I think an extra barrier between both channels would separate these topics better.
 
I didn’t know history thing in slack.


The free version of Slack limits the workspace to 10000 recent messages; however, my understanding is that this is a global number, not a per-channel number. In some free workspaces I've seen, for less active channels, this means ~zero history because other channels are so active, that they continuously exhaust the 10K most recent messages, so if you're not there to see the message, you'll see almost nothing.
 
I mentioned also Discord which also includes voice channels.

I didn't realize folks would enjoy live voice chat, but it does sound like an interesting option.

Misha 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CANgM2oN0Eh034Rw5qEwWZCFDk8-rJ2vG1bDejgs9LoddbVup9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0e593791-99c9-4645-95e6-c3e00cd83faa%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/45ef5fe9-7ea8-4757-9018-2add4e573bce%40googlegroups.com.


Re: Batching Queries to backend for faster performance

Debasish Kanhar <d.k...@...>
 

Thanks Pavel.

I've tried all those, and those have helped me reduce the execution time by almost 50% along with query optimizations as well. But that's still slower for my use case. :-)

I'm looking at more and more batching options wherever I can.

You can check out my discussion on Gremlin-users : https://groups.google.com/forum/#!topic/gremlin-users/RaIHVbDE5rk for more clarity on requirement and details and any help to give ideas on how to implement will be of great help :-)

On Thursday, 16 April 2020 21:20:40 UTC+5:30, Pavel Ershov wrote:

JG has three options to reduce number of queries https://docs.janusgraph.org/basics/configuration-reference/#query

PROPERTY_PREFETCHING -- enabled by default
USE_MULTIQUERY -- disabled by default
BATCH_PROPERTY_PREFETCHING -- disabled by default

Last two needs to implement OrderedKeyValueStore.getSlices and enable multiQuery feature by store



среда, 8 апреля 2020 г., 19:10:38 UTC+3 пользователь Debasish Kanhar написал:
For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks



Re: Batching Queries to backend for faster performance

Pavel Ershov <owner...@...>
 


JG has three options to reduce number of queries https://docs.janusgraph.org/basics/configuration-reference/#query

PROPERTY_PREFETCHING -- enabled by default
USE_MULTIQUERY -- disabled by default
BATCH_PROPERTY_PREFETCHING -- disabled by default

Last two needs to implement OrderedKeyValueStore.getSlices and enable multiQuery feature by store



среда, 8 апреля 2020 г., 19:10:38 UTC+3 пользователь Debasish Kanhar написал:

For anyone following this thread. My primary query was to implement multi-get type implementation w.r.t. to my backend. Do check Marko's comment (https://groups.google.com/d/msg/gremlin-users/QMVhLIPiGRE/Yf4ByrlrEQAJ) for clarification on what multi-get is.

My Snowflake backend interface which I've written doesn't support multiQuery yet. Is implementing multiQuery as simple as bellow mentioned steps?

Do we need to uncomment following in StoreManger class?


And setting
features.supportMultiQuery = true;

And implemting following method in KeyValueStore?

Or are there any other changes which needs to be done for implementing multiQuery feature for my backend?
The reason why I feel multiQuery will help us is because, in our simple use case:
g.V(20520).as("root").bothE().as("e").barrier().otherV().as("oth").barrier().project("r","e","o").by(select("root").valueMap()).by(select("e").valueMap()).by(select("oth").valueMap()).barrier().dedup("r", "e", "o").profile()



Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[20520])@[root]                                       1           1        1157.329     2.66
JanusGraphVertexStep(BOTH,edge)@[e]                                   12          12        3854.693     8.86
   
\_condition=(EDGE AND visibility:normal)
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
   
\_multi=true
  optimization                                                                                
7.573
  backend
-query                                                       12                     828.938
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
NoOpBarrierStep                                                       12          12           1.828     0.00
EdgeOtherVertexStep@[oth]                                             12          12           1.132     0.00
NoOpBarrierStep                                                       12          12           0.976     0.00
ProjectStep([r, e, o],[[SelectOneStep(last,root...                    12          12       38502.413    88.46
 
SelectOneStep(last,root)                                            12          12           0.508
 
PropertyMapStep(value)                                              12          12       12776.430
 
SelectOneStep(last,e)                                               12          12           0.482
 
PropertyMapStep(value)                                              12          12       15211.454
 
SelectOneStep(last,oth)                                             12          12           0.376
 
PropertyMapStep(value)                                              12          12       10508.737
NoOpBarrierStep                                                       12          12           5.692     0.01
DedupGlobalStep([r, e, o])                                            12          12           1.925     0.00
                                           
>TOTAL                     -           -       43525.993        -

As you can see, retrieval of properties of Graph elements (vertices and edges), is the most time consuming step. On further analysis I realized this is because, retireval of single property from my backend is a single query to backend. Thus, for n elements (Vertex & edges) each with M properties, total calls is N*M which kinda slows down the whole process and execution time.
Maybe that's the reason why Properties() step is the slowest step in my scenario backencd.

So, will implementing multiQuery optimize the performance in such scenario, and is there anything else which needs to be implemented as well? If yes, I can quickly implement this, and we can immediately see some performance improvements, and adding new backend source moves closer to finish line :-)

Thanks in advance.

On Tuesday, 7 April 2020 00:18:38 UTC+5:30, Debasish Kanhar wrote:
Hi All,

Well the title may be misleading, as I couldnt think of better title. Let me give a brief about the issue we are talking about, the possible solutions we are thinking, and will need your suggestions and help to connect anyone in community who can help us with the problem :-)

So, we have a requirement where we want to implement Snowflake as backend for JanusGraph. (https://groups.google.com/forum/#!topic/janusgraph-dev/9JrMYF_01Cc) . We were able to model Snowflake as KeyValueStore and we were successfully able to create an Interface layer which extends OrderedKeyValueStore to interact with Snowflake. (https://gitlab.com/system-soft-technologies-opensource/janus-snowflake) . The problem we face now is anticipated, with respect to slower response times. May it be READ or WRITE. Because, for every Gremlin query which Tinkerpop/Janusgraph issues, its broken down into multiple queries which are executed one after other in sequential other to build a response to Gremlin query.

For example, look at attached file (query breakdown.txt) it shows how the query for a simple gremlin query like g.V().has("node_label", "user").limit(5).valueMap(true) is broken down into set of multiple edgestore queries. (I'm not including queries to graphindex and janusgraph_ids are those in low volumes). We also have been able to capture the order in which the queries are executed. (1st line is 1st query, 2nd line is called second and so on).

My problem here is that, is there some way we can batch the queries here? Since Snowflake is Datawarehouse, each time a query is executed, it takes 100s of ms to execute single query. Thus for example having 100 sub queries like in example file easily takes 10 second minumum. We would like to optimize that by batching the queries the queries together, so that they can be executed together, and their response be re-conciled together?

For example if the flow is as follows:















Can we change the flow as above which is generic flow of Tinkerpop Databases to do something like bellow by bringing a an Accumulater step/Aggregator step bellow?

Instead of directly interacting with backend Snowflake with out interface, we bring in Aggregation step in between.
Aggregation step will be accumulating all the
getSlice queries like StartKey and EndKey & Store name till all Querues which can be compartmentalized are accumulated.
Once accumulated, it then executed all of them together against backend.
Once
executed, we get all queries’ response back to Aggregation step (Output) and then break it down according to input queries, send it back to GraphStep for reconciliation and building the Output of Gremlin query.

As for things we have been doing, we edited the Janusgraph core classes so that we can track the flow of information from one class to another whenever a Gremlin query has executed. So that we can know when a Gremlin query is executed, what are the classes being called, iteratively, till we reach out Interface's getSlice method and looking for repetitive patterns so that we can find the iterative patters from query. For that we have formulated an approximately 6000 lines of custom logs which we are tracking.
After analyzing logs, we have been able to reach at following flow of classes:

My question is, is this possible from Tinkerpop perspective? From Janusgraph perspective? Our project is ready to pay any JanusGraph or Tinkerpop experts part time as freelancer . We are looking for any experts in domain who can help us achive this problem statement. The results of this use case is tremendous. This can also lead to improve in performance improvements in existing backends as well, and can also help us execute a lot of memory intensive queries a lot faster.

Thanks


201 - 220 of 1582