Date   

Re: Count Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

For other readers, see also this other recent thread.

A couple of remarks:
  • In the separate edge count does it make any difference if you select the edges by label rather than by property, so g.E().hasLabel('Edge2').dedup().count() ? You can see in the JanusGraph data model that the edge label is somewhat easier to access than its properties.
  • If you use an indexing backend, it is also possible to do some simple counts against the index, but this will not help you out for your original query.
  • You also asked about using Spark. Most of the time, OLAP performance is (still) disappointing. But if you need more details you will have to show what you have tried and what problems you encountered.

Best wishes,     Marc


ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application has been killed. Reason: All masters are unresponsive! Giving up.

Vinayak Bali
 

Hi All, 

gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().has('title','Plant').count()
11:09:18 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
11:09:20 WARN  org.apache.spark.util.Utils  - Your hostname, ip-xx-xx-xx-xx resolves to a loopback address: 127.0.0.1; using xx.xx.xx.xx instead (on interface ens5)
11:09:20 WARN  org.apache.spark.util.Utils  - Set SPARK_LOCAL_IP if you need to bind to another address
11:10:25 ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application has been killed. Reason: All masters are unresponsive! Giving up.
11:10:25 WARN  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application ID is not initialized yet.
11:10:25 WARN  org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint  - Drop UnregisterApplication(null) because has not yet connected to master
11:10:25 WARN  org.apache.spark.metrics.MetricsSystem  - Stopping a MetricsSystem that is not running
11:10:26 ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52)
at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
11:10:26 ERROR org.apache.spark.scheduler.AsyncEventQueue  - Listener AppStatusListener threw an exception
java.lang.NullPointerException
at org.apache.spark.status.AppStatusListener.onApplicationEnd(AppStatusListener.scala:157)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
java.lang.NullPointerException
Type ':help' or ':h' for help.
Display stack trace? [yN]

Hadoop: 3.3.0
Spark: 2.2.2
Scala: 2.11.2
Janusgraph: 0.5.2

Referred the below documentation for configuration files: 

Thanks & Regards,
Vinayak


Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated

hadoopmarc@...
 

Hi,
JanusGraph keeps a record of open instances that sometimes is not updated properly. You can clean it with the methods here:
https://docs.janusgraph.org/advanced-topics/recovery/#janusgraph-instance-failure

Maybe it is no problem if your graph is dropped entirely, so you can also check:
https://docs.janusgraph.org/basics/common-questions/#dropping-a-database
After dropping, the graph can be recreated with the right configs.

Best wishes,     Marc


How to change GLOBAL_OFFLINE configuration when graph can't be instantiated

toom@...
 

Hi,

I am in a case where the index backend has been incorrectly configured to Elasticsearch. Now, when I try to instantiate my graph database, I get a "ConnectException: Connection refused", even if I set('index.search.backend', 'lucene') in JanusGraphFactory.
The setting index.backend is global offline, I should update it when the graph is instantiated but how can I change it if the instantiation fails ?

Thanks


Count Query Optimisation

Vinayak Bali
 

Hi All, 

The Data Model of the graph is as follows:

Nodes:

Label: Node1, count: 130K
Label: Node2, count: 183K
Label: Node3, count: 437K
Label: Node4, count: 156

Relations:

Node1 to Node2 Label: Edge1, count: 9K
Node2 to Node3 Label: Edge2, count: 200K
Node2 to Node4 Label: Edge3, count: 71K
Node4 to Node3 Label: Edge4, count: 15K
Node4 to Node1 Label: Edge5 , count: 1K

The Count query used to get vertex and edge count :

g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count())

This query takes around 3.5 mins to execute and the output returned is as follows:
[{"vertexCount":383633,"edgeCount":200166}]

The problem is traversing the edges takes more time.
g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes.
g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges

In some cases, subsequent calls are faster, due to cache usage. 
I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance?

Please help me to improve the performance, count query should not take much time.

Janusgraph : 0.5.2
Storage: Cassandra cql
The server specification is high and that is not the issue.

Thanks & Regards,
Vinayak


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

No, my last post only concerned the gremlin server on port 8185, although the
first line of step3 should have been (This was a hand edit error):
    :remote connect tinkerpop.server conf/remote8185.yaml session
The gremlin server on port 8182 from janusgraph.sh is ignored.

Anyway, the link to the succesful test on github actually held the key to some
more insight. It turns out that our issue (bindings are not automatically
generated after max 20 seconds) is absent if you use the sequence
createTemplateConfiguration() and create(). Unfortunately, this only holds on
the same server where the new configuration was created.

So, I will report this all as an issue and you can comment on it if necessary.

Best wishes,    Marc


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Hi Mark, 
I'm glad that you managed to reproduce it in the Gremlin Console. But I believe that in fact you do it with two JanusGraph servers, not with a single server as you assumed. As far as I understand janusgraph.sh in step 1 and gremlin-server.sh in step 2 are both starting a JanusGraph instance. So I think your test scenario is close to multi-node configuration. That's why a single node test you mentioned could not catch this issue. For single node it works fine. 
So should I file an issue in the project Github? 


Re: Count Query Optimization

Vinayak Bali
 

Hi All, 

query.batch = true AND query.fast-property = true 
this doesn't work. facing the same problem. Is there any other way??

Thanks & Regards,
Vinayak

On Mon, Mar 22, 2021 at 6:06 PM Boxuan Li <liboxuan@...> wrote:
Have you tried keeping query.batch = true AND query.fast-property = true?

Regards,
Boxuan

On Mar 22, 2021, at 8:28 PM, Vinayak Bali <vinayakbali16@...> wrote:

Hi All,

Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 
1) Turn on query.batch
2) Turn off 
query.fast-property
Count query is faster but edge traversal becomes more expensive.
Is there any other way to improve count performance without affecting other queries.

Thanks & Regards,
Vinayak

On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya








Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

You could also check the scenario at line 65 of:

https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-server/src/test/java/org/janusgraph/graphdb/tinkerpop/ConfigurationManagementGraphServerTest.java

This is with the inmemory storage backend rather than cassandra.

Marc


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

OK, it took me some time to reach your level of understanding, but hopefully the
scenario below really starts adding to our common understanding. While the
issue hurts you in a setup with multiple gremlin servers, the issue already
appears in a setup with a single gremlin server.

The scenario comprises the following steps:
1. start Cassandra with:
   $ bin/janusgraph.sh start
   
2. start gremlin server:
   $ bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration8185.yaml
   
3. connect with a gremlin console and run the following commands:

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8185-[70e1320f-5c24-4804-9851-cc59db23e78e]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[70e1320f-5c24-4804-9851-cc59db23e78e] - type ':remote console' to return to local mode
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph6");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null

... wait > 20 seconds
... new remote connection required for bindings to take effect

gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session
==>Configured localhost/127.0.0.1:8185-[a1ddd2f3-9ab3-4eee-a415-1aa4ea57ca66]
gremlin> graph6
No such property: graph6 for class: Script8
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> ConfiguredGraphFactory.getGraphNames()
==>graph5
==>graph4
==>graph3
==>graph2
==>graph1
==>graph6
gremlin>

If you now restart the gremlin server and reconnect in gremlin console,
graph6 is opened on the server and available as binding in the console.

So, indeed the automatic opening + binding of graphs as intended in line 105 of
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/management/JanusGraphManager.java
is somehow not functional.

Did we formulate the issue as succinct as possible now?

Best wishes,     Marc


Re: Poor performance for some simple queries - bigtable/hbase

Boxuan Li
 

Hi,

> 1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?


This (very likely) is not related to bigtable/hbase, but JanusGraph itself.


> 2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?


Expected behavior is, it tries to batch (real implementations might depend on the storage backend you use, but at least for CQL, JanusGraph uses a threadpool to fire the backend queries concurrently) the backend queries if possible.


Yes, I think the poor performance you observed should be due to query.batch not taking effect. Usually this means batch optimization for that kind of query/scenario is missing. It’s not technically impossible - it’s just areas that need to be worked on. For example, values() step can leverage batching while valueMap() step cannot. We have an open issue for this: #2444.


> 3. Any suggestions that I can try to improve this will be greatly appreciated.


1. The best way is to help JanusGraph source code improve on this area and contribute back to the community :P In case you are interested, a good starting point is to read JanusGraphLocalQueryOptimizerStrategy.


2. In some cases, you could split your single traversal into multiple steps and do batching (i.e. multi threading) by yourself. In your second example, you could use BFS and do batching for each level.


Hope this helps,

Boxuan


「<liqingtaobkd@...>」在 2021年4月1日 週四,上午2:05 寫道:

Hi,


We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true.


To start, here is my basic query. Basically it tries to trace upstream and find a subgraph.


Query 1: find 20 levels subgraph. performance is good. 

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20)


Query 2: find until the no incoming edges. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0))


Query 3: add a vertex property filter. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20)


Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)


Looking at the profile result (attached), the backend fetching behavior looks very different. It looks like for query 1&4, it batch-fetches from the backend, but it doesn't happen for query 2&3. 

Moreover, if I put something like “map”, “group”, “project”, the performance is also poor. 


So I'm looking for some help here:


1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?

2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?

3. Any suggestions that I can try to improve this will be greatly appreciated.



janusgraph.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend: hbase

storage.directory: null

storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id

storage.hbase.ext.google.bigtable.project.id: my-project-id

storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection

index.search.backend: elasticsearch

index.search.hostname: elasticsearch-master

index.search.directory: null

cache.db-cache: true

cache.db-cache-clean-wait: 20

cache.db-cache-time: 600000

cache.db-cache-size: 0.2

ids.block-size: 100000

ids.renew-percentage: 0.3

query.batch: true

query.batch-property-prefetch: true

metrics.enabled: false



gremlin-server.yaml:

host: 0.0.0.0

port: 8182

threadPoolWorker: 3

gremlinPool: 64

scriptEvaluationTimeout: "300000000"

channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer

graphs: {

  graph: /etc/opt/janusgraph/janusgraph.properties

}

scriptEngines: {

  gremlin-groovy: {

    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},

               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}}

serializers:

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

processors:

  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }}

metrics: {

  consoleReporter: {enabled: true, interval: 180000},

  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},

  jmxReporter: {enabled: true},

  slf4jReporter: {enabled: true, interval: 180000},

  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},

  graphiteReporter: {enabled: false, interval: 180000}}

maxInitialLineLength: 4096

maxHeaderSize: 8192

maxChunkSize: 8192

maxContentLength: 10000000

maxAccumulationBufferComponents: 1024

resultIterationBatchSize: 64

writeBufferLowWaterMark: 32768

writeBufferHighWaterMark: 65536

 


Poor performance for some simple queries - bigtable/hbase

liqingtaobkd@...
 

Hi,


We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true.


To start, here is my basic query. Basically it tries to trace upstream and find a subgraph.


Query 1: find 20 levels subgraph. performance is good. 

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20)


Query 2: find until the no incoming edges. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0))


Query 3: add a vertex property filter. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20)


Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)


Looking at the profile result (attached), the backend fetching behavior looks very different. It looks like for query 1&4, it batch-fetches from the backend, but it doesn't happen for query 2&3. 

Moreover, if I put something like “map”, “group”, “project”, the performance is also poor. 


So I'm looking for some help here:


1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?

2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?

3. Any suggestions that I can try to improve this will be greatly appreciated.



janusgraph.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend: hbase

storage.directory: null

storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id

storage.hbase.ext.google.bigtable.project.id: my-project-id

storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection

index.search.backend: elasticsearch

index.search.hostname: elasticsearch-master

index.search.directory: null

cache.db-cache: true

cache.db-cache-clean-wait: 20

cache.db-cache-time: 600000

cache.db-cache-size: 0.2

ids.block-size: 100000

ids.renew-percentage: 0.3

query.batch: true

query.batch-property-prefetch: true

metrics.enabled: false



gremlin-server.yaml:

host: 0.0.0.0

port: 8182

threadPoolWorker: 3

gremlinPool: 64

scriptEvaluationTimeout: "300000000"

channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer

graphs: {

  graph: /etc/opt/janusgraph/janusgraph.properties

}

scriptEngines: {

  gremlin-groovy: {

    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},

               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}}

serializers:

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

processors:

  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }}

metrics: {

  consoleReporter: {enabled: true, interval: 180000},

  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},

  jmxReporter: {enabled: true},

  slf4jReporter: {enabled: true, interval: 180000},

  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},

  graphiteReporter: {enabled: false, interval: 180000}}

maxInitialLineLength: 4096

maxHeaderSize: 8192

maxChunkSize: 8192

maxContentLength: 10000000

maxAccumulationBufferComponents: 1024

resultIterationBatchSize: 64

writeBufferLowWaterMark: 32768

writeBufferHighWaterMark: 65536

 


Re: Janusgraph 0.5.3 potential memory leak

Boxuan Li
 

FYI: we recently pushed a bug fix https://github.com/JanusGraph/janusgraph/pull/2536 which might be related to the problem you encountered. This will be released in 0.6.0.

On Mar 28, 2021, at 11:00 PM, sergeymetallic@... wrote:

After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Marc, thanks for your help.
The way you test it is similar to how it works in my environment. I do ConfiguredGraphFactory.open("graph1") as a workaround for the second JanusGraph instance. 
But the question is about this statement in documentation
The JanusGraphManager rebinds every graph stored on the ConfigurationManagementGraph (or those for which you have created configurations) every 20 seconds. This means your graph and traversal bindings for graphs created using the ConfiguredGraphFactory will be available on all JanusGraph nodes with a maximum of a 20 second lag. It also means that a binding will still be available on a node after a server restart.
 So I'm expecting that after 20 seconds the new graph traversal will be binded in all JanusGraph nodes without explicitly opening the graph with ConfiguredGraphFactory.open() for each node. I saw in JanusGraphManager the code responsible for this dynamic rebinding, but it doesn't seem to work.


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

I did not feel like debugging your docker-compose file, but I could not find any test covering your scenario on github/janusgraph either, so I just replayed your scenario with the default janusgraph-full-0.5.3 distribution. These are the steps:
  1. start a cassandra-cql instance with bin/janusgraph.sh start   (ignore the gremlin server and elasticsearch that are started too)
  2. make two files conf/gremlin-server/gremlin-server-configuration8185.yaml and conf/gremlin-server/gremlin-server-configuration8186.yaml, using conf/gremlin-server/gremlin-server-configuration.yaml as a template but changing the port numbers,
  3. start two gremlin server instances with these yaml files, so serving at port 8185 and 8186
  4. make two files conf/remote8185.yaml and remote8186.yaml
  5. start two gremlin console instances and play the following:
In the first console:
gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session
==>Configured localhost/127.0.0.1:8185-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c] - type ':remote console' to return to local mode
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph1");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null
gremlin> graph1 = ConfiguredGraphFactory.open("graph1")
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g1 = graph1.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g1.addV()
==>v[4136]
gremlin> g1.V()
==>v[4136]
gremlin> g1.tx().commit()
==>null
gremlin>

In the second console:
gremlin> :remote connect tinkerpop.server conf/remote8186.yaml session
==>Configured localhost/127.0.0.1:8186-[00729ace-48e0-4896-83e6-2aeb19abe84d]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8186]-[00729ace-48e0-4896-83e6-2aeb19abe84d] - type ':remote console' to return to local mode
gremlin> graph2 = ConfiguredGraphFactory.open("graph2")
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> graph1 = ConfiguredGraphFactory.open("graph1")
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g1=graph1.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g1.V()
==>v[4136]

The assignment to graph1 differs from what is shown in the ref docs at:
https://docs.janusgraph.org/basics/configured-graph-factory/#binding-example

But otherwise the scenario you are looking for works as expected. I trust you can use it as a reference for debugging your docker-compose file.

Best wishes,    Marc


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Hi Marc,
The environment properties in docker-compose are making it work with scylla as a backend storage and with ConfiguredGraphFactory for dynamically created graphs. It works as expected except the sync issues I described above. I attached our logs during start-up if you'd like to look at it



On Wed, Mar 24, 2021 at 9:20 PM Anton Eroshenko <erosh.anton@...> wrote:
Hi
We use dynamically created graphs in a multi-node JanusGraph cluster. With a single JunusGraph node it seems to work, but when we are using more than one, synchronization between JanusGraph nodes doesn't work, gremlin server on some nodes does not recognize newly created graph traversal. 
Documentation page says that with a maximum of a 20s lag for the binding to take effect on any node in the cluster, but in fact the new traversal is binded only on the node we did request to, not on the others, no matter how long you wait. So it looks like a bug. 
We're creating a new graph with 
ConfiguredGraphFactory.create(graphName)
It is created successfully, but not propagated to other nodes. 

As a workaround I'm calling ConfiguredGraphFactory.open(graphName) on an unsynced instance, but it is not reliable since from Java application you don't know what instance you will be redirected to by LB. 

I attached a docker-compose file with which it can be reproduced. There are two JanusGraph instances, they expose different ports. But be aware that two JanusGraph instances starting up at the same time result in concurrency error on one of the nodes, another issue of multi-node configuration. So I simply stop one of the containers on start-up and restart it later. 


Re: Janusgraph 0.5.3 potential memory leak

sergeymetallic@...
 

After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.


Re: Janusgraph 0.5.3 potential memory leak

Boxuan Li
 

After understanding more about the context, I feel https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade is not a fair comparison between iterator and iterable versions because it assumes all entries are loaded once in memory, which isn't necessarily true in real-world scenarios where the input is an AsyncResultSet that uses paging.

The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not.

@sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well.

Best regards,
Boxuan


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

If I do a $  docker run janusgraph/janusgraph:latest
the logs show it runs with the berkeleyje backend.

If I look at:
https://github.com/JanusGraph/janusgraph-docker/blob/master/0.5/Dockerfile
and your docker compose file, I can not see how you make your janusgraph containers use the scylla/cql backend. So, check the logs of your janusgraph containers to see what they are running.

And, if this was not clear, sharing configured graphs between janusgraph instances is only possible if they share a distributed storage backend. If berkeleyje is used, each janusgraph container has its private storage backend.

Best wises,    Marc

861 - 880 of 6663