Date   

Re: Flatfile for Janusgraph Backend

Oleksandr Porunov
 

Hi,

JanusGraph supports Oracle Berkeley DB which is embedded into Java and works either with a filesystem (by default) or in-memory (if configured so).
That's might be something what you are looking for. For BDB to work all you need to do is set 2 required parameters:
storage.directory = /path/to/directory/where/you/want/to/store/data/
storage.backend = berkeleyje

Thus, all your data will be stored in files on your local filesystem (or distributed if used with something like nfs, glusterfs, etc.).
That said, berkeleyje has some limitations. I would recommend testing your usecase strongly before considering using it in production as it's quite easy to corrupt your files with BDB.
You may also want to use configuration `storage.berkeleyje.lock-mode: LockMode.READ_UNCOMMITTED` as I remember I had some transaction problems without it but don't remember those problems anymore.

Hope it helps.

Best regards,
Oleksandr Porunov


Re: Performance Improvement

Vinayak Bali
 

Hi Oleksandr, 

Thank you for the detailed explanation, regarding the configuration and indexes. I will dig deeper into it and try to resolve the problem. 
But I think the queries which I am executing are not efficient.
Request you to share the gremlin queries for the above two cases mentioned, in the previous mail. That will help a lot to validate the queries.

Thanks & Regards,
Vinayak


On Mon, Oct 4, 2021 at 1:27 AM Oleksandr Porunov <alexandr.porunov@...> wrote:
Hi Vinayak,

I didn't follow your statements about count but I just want to add that if you don't use mixed index for count query than your count will require iteratively returning each element and count them in-memory (i.e. very inefficient). To check if your count query is using mixed index you can use `profile()` step.

I also noticed that you say that you need to return all properties for all vertices / edges. If so, you may consider using multiQuery which will return properties for your vertices faster than valueMap() step in certain cases. The only thing you need to consider when using `multiQuery` (actually any query) is tx-cache size (don't confuse with database cache). In case your tx-cache size is too small to hold all the vertices than some vertices' properties will be evicted from the cache. Thus, when you will try to return values for the vertex properties it will make new database calls to retrieve those properties. In the worst case all your access to properties may lead to separate database calls. To eliminate this downside you need to make sure that your transaction cache size is at least the same amount of vertices your are accessing (or bigger). In such case `multiQuery().addAllVertices(yourVertices).properties()` will return all properties for all vertices and it will hold those properties in-memory instead of evicting them.

Moreover, it looks like your use cases are read-heavy and not write-heavy. You may improve your performance by making sure all your writes are using consistency-level=ALL and all your reads are using consistency-level=ONE. You may want to disable consistency check as well as internal / external checks for your transactions if you are sure about your data. It will make some of your queries faster but less safer.

You also need to make sure that you configured your CQL driver throughput optimally for your load. In case your JanusGraph is embedded into your Application you need to make sure your application has the smallest latency between your Cassandra nodes (you may even consider placing your application to the same nodes with Cassandra or just moving your Gremlin Server to those nodes and use remote connection).

There are many JanusGraph and CQL driver configurations which you may use to tune your performance for your use-case. This topic is to broad to give all-fits solution. Different use cases might need different approaches. I would strongly recommend you to explore all JanusGraph configurations here: https://docs.janusgraph.org/configs/configuration-reference/ . It will allow to configure your general JanusGraph configuration, your transactions configuration, and your CQL driver configuration much better if you are aware about all the configurations. For advanced CQL configuration options see this configurations here: https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/configuration/reference/  (storage.cql.internal in JanusGraph).

You may also try exploring other storage backends which may give you smaller latency (hence better performance), like ScyllaDB, Aerospike, etc.

Best regards,
Oleksandr Porunov


Re: Performance Improvement

Oleksandr Porunov
 

Hi Vinayak,

I didn't follow your statements about count but I just want to add that if you don't use mixed index for count query than your count will require iteratively returning each element and count them in-memory (i.e. very inefficient). To check if your count query is using mixed index you can use `profile()` step.

I also noticed that you say that you need to return all properties for all vertices / edges. If so, you may consider using multiQuery which will return properties for your vertices faster than valueMap() step in certain cases. The only thing you need to consider when using `multiQuery` (actually any query) is tx-cache size (don't confuse with database cache). In case your tx-cache size is too small to hold all the vertices than some vertices' properties will be evicted from the cache. Thus, when you will try to return values for the vertex properties it will make new database calls to retrieve those properties. In the worst case all your access to properties may lead to separate database calls. To eliminate this downside you need to make sure that your transaction cache size is at least the same amount of vertices your are accessing (or bigger). In such case `multiQuery().addAllVertices(yourVertices).properties()` will return all properties for all vertices and it will hold those properties in-memory instead of evicting them.

Moreover, it looks like your use cases are read-heavy and not write-heavy. You may improve your performance by making sure all your writes are using consistency-level=ALL and all your reads are using consistency-level=ONE. You may want to disable consistency check as well as internal / external checks for your transactions if you are sure about your data. It will make some of your queries faster but less safer.

You also need to make sure that you configured your CQL driver throughput optimally for your load. In case your JanusGraph is embedded into your Application you need to make sure your application has the smallest latency between your Cassandra nodes (you may even consider placing your application to the same nodes with Cassandra or just moving your Gremlin Server to those nodes and use remote connection).

There are many JanusGraph and CQL driver configurations which you may use to tune your performance for your use-case. This topic is to broad to give all-fits solution. Different use cases might need different approaches. I would strongly recommend you to explore all JanusGraph configurations here: https://docs.janusgraph.org/configs/configuration-reference/ . It will allow to configure your general JanusGraph configuration, your transactions configuration, and your CQL driver configuration much better if you are aware about all the configurations. For advanced CQL configuration options see this configurations here: https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/configuration/reference/  (storage.cql.internal in JanusGraph).

You may also try exploring other storage backends which may give you smaller latency (hence better performance), like ScyllaDB, Aerospike, etc.

Best regards,
Oleksandr Porunov


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

hadoopmarc@...
 

Hi Mladen,

You are right, your issue is different than the one I mentioned about GraphManager.

I mentioned earlier that the JanusGraph test suite covers your use case:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-cql/src/test/java/org/janusgraph/core/cql/CQLConfiguredGraphFactoryTest.java

You can verify in the CI logs that this test passes successfully, see e.g. for a recent commit on master (the line starting with "Run mvn verify"):
https://github.com/JanusGraph/janusgraph/runs/3775465848?check_suite_focus=true

I do acknowledge that the return value you mention for
ConfiguredGraphFactory.getConfiguration('default_').get('storage.hostname')
==>[test-master, test-worker1]
look suspicious, but in what way does your use case differ from the tests with "MultiHost" CQL?

Best wishes,    Marc


Re: Performance Improvement

Vinayak Bali
 

Hi All, 

Updated the Janusgraph version to 0.6.0 and added the parallel execution queries in the configuration files as suggested by Oleksandr. Still, the performance is not improved. I think I am missing out on something and hence describing my requirement in detail. 

The attachment along with the mail contains the data model which I am using to query the schema.

This data model will be visible to the user on the UI. He can choose any number of nodes and relationships from the UI. Based on the selection, I am creating the queries to retrieve the data and count respectively. If the count exceeds a certain limit, the additional filter must be added by the user. Hence the count query is an important aspect of the implementation and performance matters. Let's consider some cases of the count queries: 

Case 1: User Selection: Node1, Node3 and Relation3
Count Query Output Required: Node1: 25, Node3: 30, Relation3: 50 i.e only the nodes that contain the relationship
Data Query: Must return all the data for Node1, Node3, and Relation3 including the properties

Case2: User Selection: Node1, Node3 and Relation3 and Node4,Node2 and Relation4
Count Query Output: Node1: 25, Node3: 30, Relation3: 50, Node4:10, Node2: 34, Relation4: 45, only the nodes that contain the relationship
Data Query: Must return all the data for Node1, Node3, Relation3, Node4,Node2 and Relation4 including the properties

Case3: Filters can be added to the above cases based on properties.

I have tried using union along with aggregate steps, but performance as required. The hardware configuration of the machine is not an issue.

Request you all to take a look and provide your valuable suggestions based on experience, to solve the problem. 
If possible share both the count and data query for all cases and configuration required to improve performance.

Thanks & Regards,
Vinayak


On Mon, Sep 13, 2021 at 2:22 AM Oleksandr Porunov <alexandr.porunov@...> wrote:
Hi Vinayak,

0.6.0 version of JanusGraph is released. I posted some quick tips to improve throughput to your CQL storage here:
https://lists.lfaidata.foundation/g/janusgraph-users/message/6148
I also had a post in LinkedIn with links to relative documentation parts and several better suggestions about internal ExecutorServices usage here: https://www.linkedin.com/posts/porunov_release-060-janusgraphjanusgraph-activity-6840714301062307840-r6Uw

In 0.6.0 you can improve your CQL throughput drastically using a simple configuration `storage.cql.executor-service.enabled: false` which I definitely recommend to do but you should properly configure throughput related configurations.

Best regards,
Oleksandr


Re: Potential transaction issue (JG 0.6.0)

sergeymetallic@...
 

Looking at the changes in 0.6.0 I think this problem was just hidden in the previous version as resources were not released properly    

private void releaseTransaction() {
        -//TODO: release non crucial data structures to preserve memory?
        isOpen = false;
        graph.closeTransaction(this);
        -vertexCache.close();
        +vertexCache = null;
        +indexCache = null;
        +addedRelations = null;
        +deletedRelations = null;
        +uniqueLocks = null;
        +newVertexIndexEntries = null;
        +newTypeCache = null;
    }


Potential transaction issue (JG 0.6.0)

sergeymetallic@...
 
Edited

The issue can be reproduced within certain conditions, I cannot find a recipe to reproduce it on any environment
We have a query of type 
g.inject((int) 1).union(...).limit(5L)

We have several subqueries in "union" that return a large amount of data. While executing this query we get an error 
"java.lang.NullPointerException: null
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.lambda$new$6(StandardJanusGraphTx.java:1478)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:199)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:75)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:37)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:71)
at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:70)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:29)
at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:50)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:41)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:27)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiIterator.computeNext(MultiIterator.java:42)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:48)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:26)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep$EndStep.processNextStart(ComputerAwareStep.java:82)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:44)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.DedupGlobalStep.processNextStart(DedupGlobalStep.java:107)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor.java:222)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I was able to fix the problem by checking if transaction is open
public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}
It was working fine in Janugraph 0.5.3

Important part is that I use "limit" here, without it everything works just fine. Maybe transaction is closing earlier than needed? Also I use remote graph via websocket

Any ideas what there might be wrong?


Re: Flatfile for Janusgraph Backend

hadoopmarc@...
 

No, JanusGraph does not have a storage backend for a single-node cluster that persists to single file (such as sqlite would). It is possible though to have a single Cassandra instance co-hosted on your JanusGraph machine. In fact, Cassandra is included in the janusgraph-full distribution to enable this out of the box.

Best wishes,   Marc


Flatfile for Janusgraph Backend

Vivek Singh Raghuwanshi
 

Hi Team,
I am working on the issue and want to i am using HBase and Solr for our current setup of Janusgraph.
Is that possible to replace HBase with a flat-file, instead of an in-memory backend?
 
Thanks


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Mladen Marović
 

Hi Marc,

thanks for the reply, but the error in the linked issue is a bit different, and also I'm already using graphManager: "org.janusgraph.graphdb.management.JanusGraphManager", which seemed to fix the linked issue.

I found a somewhat hackish workaround that seems to create the graph properly. Instead of using:

ConfiguredGraphFactory.create('default_');

, I manually create a configuration from the template and open the graph:

// ConfiguredGraphFactory.create('default_');
map = new HashMap<String, Object>();
map.put('graph.graphname', 'default_');
ConfiguredGraphFactory.getTemplateConfiguration().each{k, v -> map.putIfAbsent(k, v)};
conf = new MapConfiguration(map);
ConfiguredGraphFactory.createConfiguration(conf);
ConfiguredGraphFactory.open('default_');

After doing this, the graph is created seemingly without errors and I can use it as before.

I would expect ConfiguredGraphFactory.create() to do more or less exactly what I did manually, but the results were different and incorrect. Could there be some changes in ConfiguredGraphFactory.create() that cause it to behave differently than it did before?

Kind regards,

Mladen Marović


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

hadoopmarc@...
 

Hi Mladen,

After your original post, the following issue was reported:

https://github.com/JanusGraph/janusgraph/issues/2822

Marc


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Mladen Marović
 

Hi,

thanks for the replies.

However, my point was that I'm explicitly setting a value WITHOUT brackets when creating a template configuration in the startup script:

map.put('storage.hostname', 'test-master,test-worker1');

, which also shows when I inspect that template configuration:

gremlin> ConfiguredGraphFactory.getTemplateConfiguration()
...
==>storage.hostname=test-master,test-worker1
...

, but when I create a new graph, it is created using a value WITH brackets:

gremlin> ConfiguredGraphFactory.getConfiguration('default_').get('storage.hostname')
==>[test-master, test-worker1]

which then produces the error when trying to open the graph.

To reiterate, I'm using a very similar script in 0.5.3 which works without any issues, but in 0.5.3 I disable list parsing manually when creating the template. In 0.6.0 the codebase switched to commons-configuration2 which has list parsing disabled by default, so I don't think I should have to disable it again. Even if I do, or use a different delimiter, it still doesn't work because creating the template configuration seems to work fine, but creating a new graph from that configuration does not.

I hope the issue itself is more clear now.

Kind regards,

Mladen Marović


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Boxuan Li
 

Hi Mladen,

As Marc pointed out, this scenario is tested in the JanusGraph codebase. I noticed your error log contains:

2021-09-21 14:11:15,347 [pool-12-thread-1] com.datastax.oss.driver.internal.core.ContactPoints - WARN  - Ignoring invalid contact point [test-master:9042 (unknown host [test-master)
2021-09-21 14:11:15,348 [pool-12-thread-1] com.datastax.oss.driver.internal.core.ContactPoints - WARN  - Ignoring invalid contact point test-worker1]:9042 (unknown host test-worker1])

It seems that you have redundant brackets around the value. Try using "test-master, test-worker1" rather than "[test-master, test-worker1]".

Let me know if this helps.
Best, Boxuan


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

hadoopmarc@...
 

Hi Mladen,

The good news: there actually is an explicit MultiHost test for this in the janusgraph test suite:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-cql/src/test/java/org/janusgraph/core/cql/CQLConfiguredGraphFactoryTest.java

The bad news:  I could not get this test to run in my local environment. When I do (requires the docker service running):
$ git checkout v.0.6.0
$ mvn clean install -DskipTests
$ mvn test -Dtest=CQLConfiguredGraphFactoryTest -pl janusgraph-cql -Pcassandra3-murmur -e

all 21 tests fail with:
Caused by: org.janusgraph.graphdb.management.utils.ConfigurationManagementGraphNotEnabledException: Please add a key named "ConfigurationManagementGraph" to the "graphs" property in your YAML file and restart the server to be able to use the functionality of the ConfigurationManagementGraph class.
As far as I can see, the tests are run on github CI, see: https://github.com/JanusGraph/janusgraph/blob/master/.github/workflows/ci-backend-cql.yml
However, I cannot access the CI logs right now.

I do not know where to go from here.

Best wishes,    Marc

On Tue, Sep 21, 2021 at 02:29 PM, Mladen Marović wrote:
ConfiguredGraphFactory


Re: Can we find connected nodes without using gremlin query

hadoopmarc@...
 

Hi Anjani,

In individual query for a subgraph of 40 nodes will typically take between 0.1 and 1 seconds (with scyllaDb being the fastest storage backend). The number of 200 TPS can be reached easily by scaling the number of transactions, janusgraph instances and storage backend nodes.

I guess this was already clear to you and you got it confirmed now!

Marc


Re: Unconfigured table exceptions in Janusgraph 0.6.0

Oleksandr Porunov
 

Great to hear that worked for you.

Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S
As seen from this PR: https://github.com/JanusGraph/janusgraph/pull/2812 the default cql request timeout of 10 seconds isn't applied if you don't provide it explicitly. Thus, it now default to 2 seconds (as seen from DataStax driver default configurations). You can set it explicitly by providing the next config: `storage.cql.request-timeout = 10000` (to make it 10 seconds timeout).

Best regards,
Oleksandr Porunov


Re: Can we find connected nodes without using gremlin query

anjanisingh22@...
 

Thanks Marc for response.

Our graph is very large around 1TB, so can not hold in memory.
Subgraphs that we want to retrieve can have 30 -40 connected nodes.
We have requirement of around 200 TPS.
We are using JanusGraph only,  i meant i am using gremlin query to search in Janus graph.

Thanks,
Anjani
 


Re: JanusGraph 0.6.0 Binary Driver failing on serialization

hadoopmarc@...
 

Hi Chris,

Hard to interpret your report. It maybe worthwhile to try the documented way of remote connection with traversal.withRemote(...) or Cluster.open(...):

https://tinkerpop.apache.org/docs/current/reference/#gremlin-java-connecting

There should be no need to instantiate serializer objects yourself.

Best wishes,    Marc


Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Mladen Marović
 

Hello,

I wanted to try out the new Janusgraph 0.6.0 release and I encountered some unexpected issues while trying to deploy it on Cassandra 3.11.5.

One of the issues I came across seems to be connected to the switch to commons-configuration2. In previous Janusgraph versions, I used a startup script to create a configuration template for graphs and a single default_ graph:

def globals = [:]

globals << [hook : [
    onStartUp: { ctx ->
        def map = new HashMap<String, Object>();

        map.put('schema.default', 'none');
        map.put('schema.constraints', 'true');

        map.put('graph.allow-upgrade', 'false');

        map.put('storage.backend', 'cql');
        map.put('storage.hostname', 'test-master,test-worker1');
        map.put('storage.cql.replication-factor', '1');
        map.put('storage.cql.read-consistency-level', 'LOCAL_QUORUM');
        map.put('storage.cql.write-consistency-level', 'LOCAL_QUORUM');
        map.put('storage.cql.only-use-local-consistency-for-system-operations', 'true');
        map.put('storage.cql.local-datacenter', 'dc1');
        map.put('storage.cql.replication-strategy-options', 'dc1,1');
        map.put('storage.cql.replication-strategy-class', 'NetworkTopologyStrategy');

        map.put('index.es.backend', 'elasticsearch');
        map.put('index.es.hostname', 'test-dev-elasticsearch');
        map.put('index.es.elasticsearch.client-only', "true");
        map.put('index.es.elasticsearch.create.ext.index.number_of_shards', '5');
        map.put('index.es.elasticsearch.create.ext.index.number_of_replicas', '1');

        def conf = new MapConfiguration(map);
        conf.setDelimiterParsingDisabled(true);

        if (ConfiguredGraphFactory.getTemplateConfiguration() == null) {
            ConfiguredGraphFactory.createTemplateConfiguration(conf);
            ctx.logger.info("Successfully created the template configuration");
        } else {
            ConfiguredGraphFactory.updateTemplateConfiguration(conf);
            ctx.logger.info("Successfully updated the template configuration");
        }

        if (ConfiguredGraphFactory.getConfiguration('default_') == null) {
            ConfiguredGraphFactory.create('default_');
            ctx.logger.info("Successfully created the graph 'default_'");
        }
    }
] as LifeCycleHook]

An important piece of this was the line conf.setDelimiterParsingDisabled(true); which was a workaround to disable automatically parsing multiple comma-separated hostnames and other similar parameters as lists and keep them as strings. In commons-configuration2 this method does not exist and the docs (https://commons.apache.org/proper/commons-configuration/userguide/upgradeto2_0.html, https://commons.apache.org/proper/commons-configuration/apidocs/org/apache/commons/configuration2/MapConfiguration.html) state that list splitting is disabled by default. This is also confirmed here: https://github.com/JanusGraph/janusgraph/issues/1447#issuecomment-851119479.

After installing Janusgraph 0.6.0 I used a similar script, but with conf.setDelimiterParsingDisabled(true); commented out:

def globals = [:]

globals << [hook : [
    onStartUp: { ctx ->
        def map = new HashMap<String, Object>();

        map.put('schema.default', 'none');
        map.put('schema.constraints', 'true');

        map.put('graph.allow-upgrade', 'false');

        map.put('storage.backend', 'cql');
        map.put('storage.hostname', 'test-master,test-worker1');
        map.put('storage.cql.replication-factor', '1');
        map.put('storage.cql.read-consistency-level', 'LOCAL_QUORUM');
        map.put('storage.cql.write-consistency-level', 'LOCAL_QUORUM');
        map.put('storage.cql.only-use-local-consistency-for-system-operations', 'true');
        map.put('storage.cql.local-datacenter', 'dc1');
        map.put('storage.cql.replication-strategy-options', 'dc1,1');
        map.put('storage.cql.replication-strategy-class', 'NetworkTopologyStrategy');
        map.put('storage.cql.local-max-connections-per-host', 5)
        map.put('storage.cql.max-requests-per-connection', 1024)
        map.put('storage.cql.executor-service.enabled', 'false')
        map.put('storage.parallel-backend-executor-service.core-pool-size', 100)

        map.put('index.es.backend', 'elasticsearch');
        map.put('index.es.hostname', 'test-dev-elasticsearch');
        map.put('index.es.elasticsearch.client-only', "true");
        map.put('index.es.elasticsearch.create.ext.index.number_of_shards', '5');
        map.put('index.es.elasticsearch.create.ext.index.number_of_replicas', '1');

        map.put('storage.cql.internal.string-configuration', 'datastax-java-driver { advanced.metadata.schema.debouncer.window = 1 second }');

        map.put('query.smart-limit', 'false')

        def conf = new MapConfiguration(map);
        // conf.setDelimiterParsingDisabled(true);

        if (ConfiguredGraphFactory.getTemplateConfiguration() == null) {
            ConfiguredGraphFactory.createTemplateConfiguration(conf);
            ctx.logger.info("Successfully created the template configuration");
        } else {
            ConfiguredGraphFactory.updateTemplateConfiguration(conf);
            ctx.logger.info("Successfully updated the template configuration");
        }

        if (ConfiguredGraphFactory.getConfiguration('default_') == null) {
            ConfiguredGraphFactory.create('default_');
            ctx.logger.info("Successfully created the graph 'default_'");
        }
    }
] as LifeCycleHook]

The script executed without errors, but the graph default_ cannot be opened. I get the following exception:

2021-09-21 14:11:15,347 [pool-12-thread-1] com.datastax.oss.driver.internal.core.ContactPoints - WARN  - Ignoring invalid contact point [test-master:9042 (unknown host [test-master)
2021-09-21 14:11:15,348 [pool-12-thread-1] com.datastax.oss.driver.internal.core.ContactPoints - WARN  - Ignoring invalid contact point test-worker1]:9042 (unknown host test-worker1])
2021-09-21 14:11:15,352 [JanusGraph Session-admin-0] com.datastax.oss.driver.internal.core.time.Clock - INFO  - Using native clock for microsecond precision
2021-09-21 14:11:15,352 [JanusGraph Session-admin-0] com.datastax.oss.driver.internal.core.metadata.MetadataManager - INFO  - [JanusGraph Session] No contact points provided, defaulting to /127.0.0.1:9042
2021-09-21 14:11:15,355 [JanusGraph Session-admin-1] com.datastax.oss.driver.internal.core.control.ControlConnection - WARN  - [JanusGraph Session] Error connecting to Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=74fef982), trying next node (ConnectionInitException: [JanusGraph Session|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (io.netty.channel.StacklessClosedChannelException))
2021-09-21 14:11:17,434 [pool-12-thread-1] org.janusgraph.graphdb.management.JanusGraphManager - ERROR - Failed to open graph default_ with the following error:
 java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager.
Thus, it and its traversal will not be bound on this server.

This indicates that the string was split somewhere along the way, and the resulting array [test-master, test-worker1] was stored as the property value. Inspecting the configuration in the gremlin console also points to this:

gremlin> ConfiguredGraphFactory.getConfiguration('default_').get('storage.hostname')
==>[test-master, test-worker1]
gremlin> 
gremlin> ConfiguredGraphFactory.getConfiguration('default_').get('storage.hostname').getClass()
==>class java.lang.String
gremlin> 

I even tried to manually set the delimiter in my script to be something other than ',' (with something like conf.setListDelimiterHandler(new org.apache.commons.configuration2.convert.DefaultListDelimiterHandler(';' as char));), but that didn't help either.

Did anyone else come across this issue? Did I miss something else in the changelog about this?

Kind regards,

Mladen Marović


Re: Can we find connected nodes without using gremlin query

hadoopmarc@...
 

Hi Anjani,

How large is your graph, can you hold it in memory in its entirety?
How large are the subgraphs you want to retrieve in terms of width and size?
What are your requirements regarding TPS?
When you refer to "gremlin" what graph system did you use instead of JanusGraph?

Best wishes,    Marc

481 - 500 of 6666