Date   

Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Hi Mark, 
I'm glad that you managed to reproduce it in the Gremlin Console. But I believe that in fact you do it with two JanusGraph servers, not with a single server as you assumed. As far as I understand janusgraph.sh in step 1 and gremlin-server.sh in step 2 are both starting a JanusGraph instance. So I think your test scenario is close to multi-node configuration. That's why a single node test you mentioned could not catch this issue. For single node it works fine. 
So should I file an issue in the project Github? 


Re: Count Query Optimization

Vinayak Bali
 

Hi All, 

query.batch = true AND query.fast-property = true 
this doesn't work. facing the same problem. Is there any other way??

Thanks & Regards,
Vinayak

On Mon, Mar 22, 2021 at 6:06 PM Boxuan Li <liboxuan@...> wrote:
Have you tried keeping query.batch = true AND query.fast-property = true?

Regards,
Boxuan

On Mar 22, 2021, at 8:28 PM, Vinayak Bali <vinayakbali16@...> wrote:

Hi All,

Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 
1) Turn on query.batch
2) Turn off 
query.fast-property
Count query is faster but edge traversal becomes more expensive.
Is there any other way to improve count performance without affecting other queries.

Thanks & Regards,
Vinayak

On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya








Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

You could also check the scenario at line 65 of:

https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-server/src/test/java/org/janusgraph/graphdb/tinkerpop/ConfigurationManagementGraphServerTest.java

This is with the inmemory storage backend rather than cassandra.

Marc


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

OK, it took me some time to reach your level of understanding, but hopefully the
scenario below really starts adding to our common understanding. While the
issue hurts you in a setup with multiple gremlin servers, the issue already
appears in a setup with a single gremlin server.

The scenario comprises the following steps:
1. start Cassandra with:
   $ bin/janusgraph.sh start
   
2. start gremlin server:
   $ bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration8185.yaml
   
3. connect with a gremlin console and run the following commands:

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8185-[70e1320f-5c24-4804-9851-cc59db23e78e]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[70e1320f-5c24-4804-9851-cc59db23e78e] - type ':remote console' to return to local mode
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph6");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null

... wait > 20 seconds
... new remote connection required for bindings to take effect

gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session
==>Configured localhost/127.0.0.1:8185-[a1ddd2f3-9ab3-4eee-a415-1aa4ea57ca66]
gremlin> graph6
No such property: graph6 for class: Script8
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> ConfiguredGraphFactory.getGraphNames()
==>graph5
==>graph4
==>graph3
==>graph2
==>graph1
==>graph6
gremlin>

If you now restart the gremlin server and reconnect in gremlin console,
graph6 is opened on the server and available as binding in the console.

So, indeed the automatic opening + binding of graphs as intended in line 105 of
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/management/JanusGraphManager.java
is somehow not functional.

Did we formulate the issue as succinct as possible now?

Best wishes,     Marc


Re: Poor performance for some simple queries - bigtable/hbase

Boxuan Li
 

Hi,

> 1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?


This (very likely) is not related to bigtable/hbase, but JanusGraph itself.


> 2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?


Expected behavior is, it tries to batch (real implementations might depend on the storage backend you use, but at least for CQL, JanusGraph uses a threadpool to fire the backend queries concurrently) the backend queries if possible.


Yes, I think the poor performance you observed should be due to query.batch not taking effect. Usually this means batch optimization for that kind of query/scenario is missing. It’s not technically impossible - it’s just areas that need to be worked on. For example, values() step can leverage batching while valueMap() step cannot. We have an open issue for this: #2444.


> 3. Any suggestions that I can try to improve this will be greatly appreciated.


1. The best way is to help JanusGraph source code improve on this area and contribute back to the community :P In case you are interested, a good starting point is to read JanusGraphLocalQueryOptimizerStrategy.


2. In some cases, you could split your single traversal into multiple steps and do batching (i.e. multi threading) by yourself. In your second example, you could use BFS and do batching for each level.


Hope this helps,

Boxuan


「<liqingtaobkd@...>」在 2021年4月1日 週四,上午2:05 寫道:

Hi,


We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true.


To start, here is my basic query. Basically it tries to trace upstream and find a subgraph.


Query 1: find 20 levels subgraph. performance is good. 

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20)


Query 2: find until the no incoming edges. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0))


Query 3: add a vertex property filter. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20)


Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)


Looking at the profile result (attached), the backend fetching behavior looks very different. It looks like for query 1&4, it batch-fetches from the backend, but it doesn't happen for query 2&3. 

Moreover, if I put something like “map”, “group”, “project”, the performance is also poor. 


So I'm looking for some help here:


1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?

2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?

3. Any suggestions that I can try to improve this will be greatly appreciated.



janusgraph.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend: hbase

storage.directory: null

storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id

storage.hbase.ext.google.bigtable.project.id: my-project-id

storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection

index.search.backend: elasticsearch

index.search.hostname: elasticsearch-master

index.search.directory: null

cache.db-cache: true

cache.db-cache-clean-wait: 20

cache.db-cache-time: 600000

cache.db-cache-size: 0.2

ids.block-size: 100000

ids.renew-percentage: 0.3

query.batch: true

query.batch-property-prefetch: true

metrics.enabled: false



gremlin-server.yaml:

host: 0.0.0.0

port: 8182

threadPoolWorker: 3

gremlinPool: 64

scriptEvaluationTimeout: "300000000"

channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer

graphs: {

  graph: /etc/opt/janusgraph/janusgraph.properties

}

scriptEngines: {

  gremlin-groovy: {

    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},

               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}}

serializers:

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

processors:

  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }}

metrics: {

  consoleReporter: {enabled: true, interval: 180000},

  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},

  jmxReporter: {enabled: true},

  slf4jReporter: {enabled: true, interval: 180000},

  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},

  graphiteReporter: {enabled: false, interval: 180000}}

maxInitialLineLength: 4096

maxHeaderSize: 8192

maxChunkSize: 8192

maxContentLength: 10000000

maxAccumulationBufferComponents: 1024

resultIterationBatchSize: 64

writeBufferLowWaterMark: 32768

writeBufferHighWaterMark: 65536

 


Poor performance for some simple queries - bigtable/hbase

liqingtaobkd@...
 

Hi,


We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true.


To start, here is my basic query. Basically it tries to trace upstream and find a subgraph.


Query 1: find 20 levels subgraph. performance is good. 

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20)


Query 2: find until the no incoming edges. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0))


Query 3: add a vertex property filter. performance is NOT good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20)


Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good.

g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)


Looking at the profile result (attached), the backend fetching behavior looks very different. It looks like for query 1&4, it batch-fetches from the backend, but it doesn't happen for query 2&3. 

Moreover, if I put something like “map”, “group”, “project”, the performance is also poor. 


So I'm looking for some help here:


1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?

2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?

3. Any suggestions that I can try to improve this will be greatly appreciated.



janusgraph.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend: hbase

storage.directory: null

storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id

storage.hbase.ext.google.bigtable.project.id: my-project-id

storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection

index.search.backend: elasticsearch

index.search.hostname: elasticsearch-master

index.search.directory: null

cache.db-cache: true

cache.db-cache-clean-wait: 20

cache.db-cache-time: 600000

cache.db-cache-size: 0.2

ids.block-size: 100000

ids.renew-percentage: 0.3

query.batch: true

query.batch-property-prefetch: true

metrics.enabled: false



gremlin-server.yaml:

host: 0.0.0.0

port: 8182

threadPoolWorker: 3

gremlinPool: 64

scriptEvaluationTimeout: "300000000"

channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer

graphs: {

  graph: /etc/opt/janusgraph/janusgraph.properties

}

scriptEngines: {

  gremlin-groovy: {

    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},

               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},

               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}}

serializers:

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}

  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}

processors:

  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

  - { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }}

metrics: {

  consoleReporter: {enabled: true, interval: 180000},

  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},

  jmxReporter: {enabled: true},

  slf4jReporter: {enabled: true, interval: 180000},

  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},

  graphiteReporter: {enabled: false, interval: 180000}}

maxInitialLineLength: 4096

maxHeaderSize: 8192

maxChunkSize: 8192

maxContentLength: 10000000

maxAccumulationBufferComponents: 1024

resultIterationBatchSize: 64

writeBufferLowWaterMark: 32768

writeBufferHighWaterMark: 65536

 


Re: Janusgraph 0.5.3 potential memory leak

Boxuan Li
 

FYI: we recently pushed a bug fix https://github.com/JanusGraph/janusgraph/pull/2536 which might be related to the problem you encountered. This will be released in 0.6.0.

On Mar 28, 2021, at 11:00 PM, sergeymetallic@... wrote:

After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Marc, thanks for your help.
The way you test it is similar to how it works in my environment. I do ConfiguredGraphFactory.open("graph1") as a workaround for the second JanusGraph instance. 
But the question is about this statement in documentation
The JanusGraphManager rebinds every graph stored on the ConfigurationManagementGraph (or those for which you have created configurations) every 20 seconds. This means your graph and traversal bindings for graphs created using the ConfiguredGraphFactory will be available on all JanusGraph nodes with a maximum of a 20 second lag. It also means that a binding will still be available on a node after a server restart.
 So I'm expecting that after 20 seconds the new graph traversal will be binded in all JanusGraph nodes without explicitly opening the graph with ConfiguredGraphFactory.open() for each node. I saw in JanusGraphManager the code responsible for this dynamic rebinding, but it doesn't seem to work.


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

I did not feel like debugging your docker-compose file, but I could not find any test covering your scenario on github/janusgraph either, so I just replayed your scenario with the default janusgraph-full-0.5.3 distribution. These are the steps:
  1. start a cassandra-cql instance with bin/janusgraph.sh start   (ignore the gremlin server and elasticsearch that are started too)
  2. make two files conf/gremlin-server/gremlin-server-configuration8185.yaml and conf/gremlin-server/gremlin-server-configuration8186.yaml, using conf/gremlin-server/gremlin-server-configuration.yaml as a template but changing the port numbers,
  3. start two gremlin server instances with these yaml files, so serving at port 8185 and 8186
  4. make two files conf/remote8185.yaml and remote8186.yaml
  5. start two gremlin console instances and play the following:
In the first console:
gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session
==>Configured localhost/127.0.0.1:8185-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c] - type ':remote console' to return to local mode
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph1");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null
gremlin> graph1 = ConfiguredGraphFactory.open("graph1")
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g1 = graph1.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g1.addV()
==>v[4136]
gremlin> g1.V()
==>v[4136]
gremlin> g1.tx().commit()
==>null
gremlin>

In the second console:
gremlin> :remote connect tinkerpop.server conf/remote8186.yaml session
==>Configured localhost/127.0.0.1:8186-[00729ace-48e0-4896-83e6-2aeb19abe84d]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8186]-[00729ace-48e0-4896-83e6-2aeb19abe84d] - type ':remote console' to return to local mode
gremlin> graph2 = ConfiguredGraphFactory.open("graph2")
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> graph1 = ConfiguredGraphFactory.open("graph1")
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g1=graph1.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g1.V()
==>v[4136]

The assignment to graph1 differs from what is shown in the ref docs at:
https://docs.janusgraph.org/basics/configured-graph-factory/#binding-example

But otherwise the scenario you are looking for works as expected. I trust you can use it as a reference for debugging your docker-compose file.

Best wishes,    Marc


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Hi Marc,
The environment properties in docker-compose are making it work with scylla as a backend storage and with ConfiguredGraphFactory for dynamically created graphs. It works as expected except the sync issues I described above. I attached our logs during start-up if you'd like to look at it



On Wed, Mar 24, 2021 at 9:20 PM Anton Eroshenko <erosh.anton@...> wrote:
Hi
We use dynamically created graphs in a multi-node JanusGraph cluster. With a single JunusGraph node it seems to work, but when we are using more than one, synchronization between JanusGraph nodes doesn't work, gremlin server on some nodes does not recognize newly created graph traversal. 
Documentation page says that with a maximum of a 20s lag for the binding to take effect on any node in the cluster, but in fact the new traversal is binded only on the node we did request to, not on the others, no matter how long you wait. So it looks like a bug. 
We're creating a new graph with 
ConfiguredGraphFactory.create(graphName)
It is created successfully, but not propagated to other nodes. 

As a workaround I'm calling ConfiguredGraphFactory.open(graphName) on an unsynced instance, but it is not reliable since from Java application you don't know what instance you will be redirected to by LB. 

I attached a docker-compose file with which it can be reproduced. There are two JanusGraph instances, they expose different ports. But be aware that two JanusGraph instances starting up at the same time result in concurrency error on one of the nodes, another issue of multi-node configuration. So I simply stop one of the containers on start-up and restart it later. 


Re: Janusgraph 0.5.3 potential memory leak

sergeymetallic@...
 

After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.


Re: Janusgraph 0.5.3 potential memory leak

Boxuan Li
 

After understanding more about the context, I feel https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade is not a fair comparison between iterator and iterable versions because it assumes all entries are loaded once in memory, which isn't necessarily true in real-world scenarios where the input is an AsyncResultSet that uses paging.

The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not.

@sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well.

Best regards,
Boxuan


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

If I do a $  docker run janusgraph/janusgraph:latest
the logs show it runs with the berkeleyje backend.

If I look at:
https://github.com/JanusGraph/janusgraph-docker/blob/master/0.5/Dockerfile
and your docker compose file, I can not see how you make your janusgraph containers use the scylla/cql backend. So, check the logs of your janusgraph containers to see what they are running.

And, if this was not clear, sharing configured graphs between janusgraph instances is only possible if they share a distributed storage backend. If berkeleyje is used, each janusgraph container has its private storage backend.

Best wises,    Marc


Re: Janusgraph 0.5.3 potential memory leak

Boxuan Li
 

Can someone share how you run the benchmark (like what JMH version and what janusgraph version you are using) provided by @mad? I ran the benchmark on master (f19df6) but I see OOM errors for both iterator and iterable versions. Furthermore, I don't see any OOM report on the final result (JMH simply omits those runs with exceptions in the final report).

My environment:

# JMH version: 1.29
# VM version: JDK 1.8.0_275, OpenJDK 64-Bit Server VM, 25.275-b01
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -Dvisualvm.id=32547661350356 -Dfile.encoding=UTF-8 -Xmx1G

My dependencies:

<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.29</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.29</version>
<scope>provided</scope>
</dependency>

My benchmark results:


 

Benchmark                               (size)  (valueSize)   Mode  Cnt     Score      Error  Units

StaticArrayEntryListBenchmark.iterable   10000           50  thrpt    5  3653.903 ± 1485.691  ops/s

StaticArrayEntryListBenchmark.iterable   10000         1000  thrpt    5   356.528 ±  100.197  ops/s

StaticArrayEntryListBenchmark.iterable   10000         5000  thrpt    5    90.776 ±   47.783  ops/s

StaticArrayEntryListBenchmark.iterable  100000           50  thrpt    5   202.407 ±   22.577  ops/s

StaticArrayEntryListBenchmark.iterable  100000         1000  thrpt    5    38.114 ±    1.196  ops/s

StaticArrayEntryListBenchmark.iterator   10000           50  thrpt    5  2079.672 ±  312.171  ops/s

StaticArrayEntryListBenchmark.iterator   10000         1000  thrpt    5   170.326 ±   33.554  ops/s

StaticArrayEntryListBenchmark.iterator   10000         5000  thrpt    5    31.522 ±    2.774  ops/s

StaticArrayEntryListBenchmark.iterator  100000           50  thrpt    5   159.831 ±   44.197  ops/s

StaticArrayEntryListBenchmark.iterator  100000         1000  thrpt    5    18.367 ±    4.123  ops/s


Re: Duplicate Vertex

Boxuan Li
 

I couldn't reproduce this on the v0.4 branch using the below code:


@Test
public void testTopic81433493() {
PropertyKey prop1 = mgmt.makePropertyKey("prop1").dataType(String.class).make();
PropertyKey prop2 = mgmt.makePropertyKey("prop2").dataType(String.class).make();
mgmt.buildIndex("comp1", Vertex.class).addKey(prop1).buildCompositeIndex();
mgmt.buildIndex("comp2", Vertex.class).addKey(prop2).buildCompositeIndex();
finishSchema();

tx.addVertex("prop1", "value-foo");
assertTrue(tx.traversal().V().has("prop1", "value-foo").hasNext());
assertTrue(tx.traversal().V().or(__.has("prop1", "value-foo"), __.has("prop2", "value-bar")).hasNext());
}


Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Hi
We use dynamically created graphs in a multi-node JanusGraph cluster. With a single JunusGraph node it seems to work, but when we are using more than one, synchronization between JanusGraph nodes doesn't work, gremlin server on some nodes does not recognize newly created graph traversal. 
Documentation page says that with a maximum of a 20s lag for the binding to take effect on any node in the cluster, but in fact the new traversal is binded only on the node we did request to, not on the others, no matter how long you wait. So it looks like a bug. 
We're creating a new graph with 
ConfiguredGraphFactory.create(graphName)
It is created successfully, but not propagated to other nodes. 

As a workaround I'm calling ConfiguredGraphFactory.open(graphName) on an unsynced instance, but it is not reliable since from Java application you don't know what instance you will be redirected to by LB. 

I attached a docker-compose file with which it can be reproduced. There are two JanusGraph instances, they expose different ports. But be aware that two JanusGraph instances starting up at the same time result in concurrency error on one of the nodes, another issue of multi-node configuration. So I simply stop one of the containers on start-up and restart it later. 


Re: Query not returning always the same result

hadoopmarc@...
 

Hi Adrian,

What happens if you rewrite the query to:

lmg.traversal().V(analysisVertex).out().emit().repeat(
                __.in().choose(
                        __.hasLabel("result"),
                        __.has("analysisId", analysisId),
                        __.identity()
                )
        ).tree().next().getTreesAtDepth(3);

I do not understand how leaving out the else clause leads to the random behavior you describe, but it won't hurt to state the intended else clause explicitly. If the else clause is not a valid case in your data model, you do not need the choose() step.

Best wishes,   Marc


Query not returning always the same result

Adrián Abalde Méndez <aabalde@...>
 

Hello,

I'm having a strange behaviour with janusgraph and I would like to post it here and see if anyone can give me some help.

The thing is that I'm doing a tree query for getting my graph data structured as a tree, and from there build the results I'm interested in. This query works fine, but the problem is that I don't get the same results every time. It doesn't have any sense that, if the graph is the same and hasn't changed, the query returns different trees, does it?

Both trees I'm getting are not very different between them. We have a node type called "group", and some other nodes hanging from this "groups" called "results", and is just that some times the tree comes with the results and others not, but it has always the "group" structure.

In case you want to know it, the query I'm performing is this one:


lmg.traversal().V(analysisVertex).out().emit().repeat(
                __.in().choose(
                        __.label().is(P.eq("result")),
                        __.where(__.has("analysisId", analysisId))
                )
        ).tree().next().getTreesAtDepth(3);


where starting from an "analysis" node, I filter the graph to just have a tree with the groups and the results with the analysisId I'm interested in.

I guess that is not a problem of the query itself, because when it has the results, it works fine. But I don't know why I am getting this strange inconsistent behaviour.

Any ideas about this? Thanks in advance :)

Best regards,
Adrian


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

hadoopmarc@...
 

Hi,
You did not answer my questions about the "id" poperty. TinkerPop uses a Token.ID that has the value 'id', see:

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/T.java

I suspect that you ingested data without schema validation ("automatic schema creation"), that your input data contains an "id¨ property key and that JanusGraph/TinkerPop get confused about which id is what. So I strongly suggest that you make sure that this is not the root cause of this issue. To be sure, it would still be an issue but not for you anymore :-)

Best wishes,    Marc


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Another really strange observation

gremlin> g.V().has('id','131594d6a416666b401a9e48e54ebc8f22be75e2593c5d98e2d9ecfd719d5f29').has('type','email_sha256_lowercase').valueMap(true)
==>[dpts_678:[1595548800],label:vertex,id:201523209257056,id:[19df651e-90d5-47f6-af2e-35dcb59bcc0a],type:[id_mid_10],soft_del:[false],country_GBR:[678]]


Could you please have a look?

861 - 880 of 6656