OK, it took me some time to reach your level of understanding, but hopefully the
scenario below really starts adding to our common understanding. While the
issue hurts you in a setup with multiple gremlin servers, the issue already
appears in a setup with a single gremlin server.
The scenario comprises the following steps:
1. start Cassandra with:
$ bin/janusgraph.sh start
2. start gremlin server:
$ bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration8185.yaml
3. connect with a gremlin console and run the following commands:
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8185-[70e1320f-5c24-4804-9851-cc59db23e78e]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[70e1320f-5c24-4804-9851-cc59db23e78e] - type ':remote console' to return to local mode
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph6");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null
... wait > 20 seconds
... new remote connection required for bindings to take effect
gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session
==>Configured localhost/127.0.0.1:8185-[a1ddd2f3-9ab3-4eee-a415-1aa4ea57ca66]
gremlin> graph6
No such property: graph6 for class: Script8
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> ConfiguredGraphFactory.getGraphNames()
==>graph5
==>graph4
==>graph3
==>graph2
==>graph1
==>graph6
gremlin>
If you now restart the gremlin server and reconnect in gremlin console,
graph6 is opened on the server and available as binding in the console.
So, indeed the automatic opening + binding of graphs as intended in line 105 of
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/management/JanusGraphManager.java
is somehow not functional.
Did we formulate the issue as succinct as possible now?
Best wishes, Marc
> 1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?
This (very likely) is not related to bigtable/hbase, but JanusGraph itself.
> 2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?
Expected behavior is, it tries to batch (real implementations might depend on the storage backend you use, but at least for CQL, JanusGraph uses a threadpool to fire the backend queries concurrently) the backend queries if possible.
Yes, I think the poor performance you observed should be due to query.batch not taking effect. Usually this means batch optimization for that kind of query/scenario is missing. It’s not technically impossible - it’s just areas that need to be worked on. For example, values() step can leverage batching while valueMap() step cannot. We have an open issue for this: #2444.
> 3. Any suggestions that I can try to improve this will be greatly appreciated.
1. The best way is to help JanusGraph source code improve on this area and contribute back to the community :P In case you are interested, a good starting point is to read JanusGraphLocalQueryOptimizerStrategy.
2. In some cases, you could split your single traversal into multiple steps and do batching (i.e. multi threading) by yourself. In your second example, you could use BFS and do batching for each level.
Hope this helps,
Boxuan
Hi,
We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true.
To start, here is my basic query. Basically it tries to trace upstream and find a subgraph.
Query 1: find 20 levels subgraph. performance is good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20)
Query 2: find until the no incoming edges. performance is NOT good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0))
Query 3: add a vertex property filter. performance is NOT good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20)
Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)
Looking at the profile result (attached), the backend fetching behavior looks very different. It looks like for query 1&4, it batch-fetches from the backend, but it doesn't happen for query 2&3.Moreover, if I put something like “map”, “group”, “project”, the performance is also poor.
So I'm looking for some help here:
1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?
2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?
3. Any suggestions that I can try to improve this will be greatly appreciated.
janusgraph.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend: hbase
storage.directory: null
storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id
storage.hbase.ext.google.bigtable.project.id: my-project-id
storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
index.search.backend: elasticsearch
index.search.hostname: elasticsearch-master
index.search.directory: null
cache.db-cache: true
cache.db-cache-clean-wait: 20
cache.db-cache-time: 600000
cache.db-cache-size: 0.2
ids.block-size: 100000
ids.renew-percentage: 0.3
query.batch: true
query.batch-property-prefetch: true
metrics.enabled: false
gremlin-server.yaml:
host: 0.0.0.0
port: 8182
threadPoolWorker: 3
gremlinPool: 64
scriptEvaluationTimeout: "300000000"
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
graph: /etc/opt/janusgraph/janusgraph.properties
}
scriptEngines: {
gremlin-groovy: {
plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 10000000
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
Hi,
We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true.
To start, here is my basic query. Basically it tries to trace upstream and find a subgraph.
Query 1: find 20 levels subgraph. performance is good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20)
Query 2: find until the no incoming edges. performance is NOT good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0))
Query 3: add a vertex property filter. performance is NOT good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20)
Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good.
g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)
Looking at the profile result (attached), the backend fetching behavior looks very different. It looks like for query 1&4, it batch-fetches from the backend, but it doesn't happen for query 2&3.
Moreover, if I put something like “map”, “group”, “project”, the performance is also poor.
So I'm looking for some help here:
1. Is this behavior expected, or it's just bigtable or hbase that might have this issue?
2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect?
3. Any suggestions that I can try to improve this will be greatly appreciated.
janusgraph.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend: hbase
storage.directory: null
storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id
storage.hbase.ext.google.bigtable.project.id: my-project-id
storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
index.search.backend: elasticsearch
index.search.hostname: elasticsearch-master
index.search.directory: null
cache.db-cache: true
cache.db-cache-clean-wait: 20
cache.db-cache-time: 600000
cache.db-cache-size: 0.2
ids.block-size: 100000
ids.renew-percentage: 0.3
query.batch: true
query.batch-property-prefetch: true
metrics.enabled: false
gremlin-server.yaml:
host: 0.0.0.0
port: 8182
threadPoolWorker: 3
gremlinPool: 64
scriptEvaluationTimeout: "300000000"
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
graph: /etc/opt/janusgraph/janusgraph.properties
}
scriptEngines: {
gremlin-groovy: {
plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 10000000
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
On Mar 28, 2021, at 11:00 PM, sergeymetallic@... wrote:After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.
The JanusGraphManager rebinds every graph stored on the ConfigurationManagementGraph (or those for which you have created configurations) every 20 seconds. This means your graph and traversal bindings for graphs created using the ConfiguredGraphFactory will be available on all JanusGraph nodes with a maximum of a 20 second lag. It also means that a binding will still be available on a node after a server restart.
I did not feel like debugging your docker-compose file, but I could not find any test covering your scenario on github/janusgraph either, so I just replayed your scenario with the default janusgraph-full-0.5.3 distribution. These are the steps:
- start a cassandra-cql instance with bin/janusgraph.sh start (ignore the gremlin server and elasticsearch that are started too)
- make two files conf/gremlin-server/gremlin-server-configuration8185.yaml and conf/gremlin-server/gremlin-server-configuration8186.yaml, using conf/gremlin-server/gremlin-server-configuration.yaml as a template but changing the port numbers,
- start two gremlin server instances with these yaml files, so serving at port 8185 and 8186
- make two files conf/remote8185.yaml and remote8186.yaml
- start two gremlin console instances and play the following:
gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session
==>Configured localhost/127.0.0.1:8185-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c] - type ':remote console' to return to local mode
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph1");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null
gremlin> graph1 = ConfiguredGraphFactory.open("graph1")
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g1 = graph1.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g1.addV()
==>v[4136]
gremlin> g1.V()
==>v[4136]
gremlin> g1.tx().commit()
==>null
gremlin>
In the second console:
gremlin> :remote connect tinkerpop.server conf/remote8186.yaml session
==>Configured localhost/127.0.0.1:8186-[00729ace-48e0-4896-83e6-2aeb19abe84d]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8186]-[00729ace-48e0-4896-83e6-2aeb19abe84d] - type ':remote console' to return to local mode
gremlin> graph2 = ConfiguredGraphFactory.open("graph2")
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> graph1 = ConfiguredGraphFactory.open("graph1")
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g1=graph1.traversal()
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g1.V()
==>v[4136]
The assignment to graph1 differs from what is shown in the ref docs at:
https://docs.janusgraph.org/basics/configured-graph-factory/#binding-example
But otherwise the scenario you are looking for works as expected. I trust you can use it as a reference for debugging your docker-compose file.
Best wishes, Marc
HiWe use dynamically created graphs in a multi-node JanusGraph cluster. With a single JunusGraph node it seems to work, but when we are using more than one, synchronization between JanusGraph nodes doesn't work, gremlin server on some nodes does not recognize newly created graph traversal.Documentation page says that with a maximum of a 20s lag for the binding to take effect on any node in the cluster, but in fact the new traversal is binded only on the node we did request to, not on the others, no matter how long you wait. So it looks like a bug.We're creating a new graph withConfiguredGraphFactory.create(graphName)It is created successfully, but not propagated to other nodes.As a workaround I'm calling ConfiguredGraphFactory.open(graphName) on an unsynced instance, but it is not reliable since from Java application you don't know what instance you will be redirected to by LB.I attached a docker-compose file with which it can be reproduced. There are two JanusGraph instances, they expose different ports. But be aware that two JanusGraph instances starting up at the same time result in concurrency error on one of the nodes, another issue of multi-node configuration. So I simply stop one of the containers on start-up and restart it later.
The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not.
@sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well.
Best regards,
Boxuan
If I do a $ docker run janusgraph/janusgraph:latest
the logs show it runs with the berkeleyje backend.
If I look at:
https://github.com/JanusGraph/janusgraph-docker/blob/master/0.5/Dockerfile
and your docker compose file, I can not see how you make your janusgraph containers use the scylla/cql backend. So, check the logs of your janusgraph containers to see what they are running.
And, if this was not clear, sharing configured graphs between janusgraph instances is only possible if they share a distributed storage backend. If berkeleyje is used, each janusgraph container has its private storage backend.
Best wises, Marc
My environment:
# JMH version: 1.29# VM version: JDK 1.8.0_275, OpenJDK 64-Bit Server VM, 25.275-b01# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java# VM options: -Dvisualvm.id=32547661350356 -Dfile.encoding=UTF-8 -Xmx1G
My dependencies:
My benchmark results:<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.29</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.29</version>
<scope>provided</scope>
</dependency>
Benchmark (size) (valueSize) Mode Cnt Score Error Units
StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 5 3653.903 ± 1485.691 ops/s
StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 5 356.528 ± 100.197 ops/s
StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 5 90.776 ± 47.783 ops/s
StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 5 202.407 ± 22.577 ops/s
StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 5 38.114 ± 1.196 ops/s
StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 5 2079.672 ± 312.171 ops/s
StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 5 170.326 ± 33.554 ops/s
StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 5 31.522 ± 2.774 ops/s
StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 5 159.831 ± 44.197 ops/s
StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 5 18.367 ± 4.123 ops/s
@Test
public void testTopic81433493() {
PropertyKey prop1 = mgmt.makePropertyKey("prop1").dataType(String.class).make();
PropertyKey prop2 = mgmt.makePropertyKey("prop2").dataType(String.class).make();
mgmt.buildIndex("comp1", Vertex.class).addKey(prop1).buildCompositeIndex();
mgmt.buildIndex("comp2", Vertex.class).addKey(prop2).buildCompositeIndex();
finishSchema();
tx.addVertex("prop1", "value-foo");
assertTrue(tx.traversal().V().has("prop1", "value-foo").hasNext());
assertTrue(tx.traversal().V().or(__.has("prop1", "value-foo"), __.has("prop2", "value-bar")).hasNext());
}
What happens if you rewrite the query to:
lmg.traversal().V(analysisVertex).out().emit().repeat(
__.in().choose(
__.hasLabel("result"),
__.has("analysisId", analysisId),
__.identity()
)
).tree().next().getTreesAtDepth(3);
I do not understand how leaving out the else clause leads to the random behavior you describe, but it won't hurt to state the intended else clause explicitly. If the else clause is not a valid case in your data model, you do not need the choose() step.
Best wishes, Marc
I'm having a strange behaviour with janusgraph and I would like to post it here and see if anyone can give me some help.
The thing is that I'm doing a tree query for getting my graph data structured as a tree, and from there build the results I'm interested in. This query works fine, but the problem is that I don't get the same results every time. It doesn't have any sense that, if the graph is the same and hasn't changed, the query returns different trees, does it?
Both trees I'm getting are not very different between them. We have a node type called "group", and some other nodes hanging from this "groups" called "results", and is just that some times the tree comes with the results and others not, but it has always the "group" structure.
In case you want to know it, the query I'm performing is this one:
lmg.traversal().V(analysisVertex).out().emit().repeat(
__.in().choose(
__.label().is(P.eq("result")),
__.where(__.has("analysisId", analysisId))
)
).tree().next().getTreesAtDepth(3);
where starting from an "analysis" node, I filter the graph to just have a tree with the groups and the results with the analysisId I'm interested in.
I guess that is not a problem of the query itself, because when it has the results, it works fine. But I don't know why I am getting this strange inconsistent behaviour.
Any ideas about this? Thanks in advance :)
You did not answer my questions about the "id" poperty. TinkerPop uses a Token.ID that has the value 'id', see:
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/T.java
I suspect that you ingested data without schema validation ("automatic schema creation"), that your input data contains an "id¨ property key and that JanusGraph/TinkerPop get confused about which id is what. So I strongly suggest that you make sure that this is not the root cause of this issue. To be sure, it would still be an issue but not for you anymore :-)
Best wishes, Marc
gremlin> g.V().has('id','131594d6a416666b401a9e48e54ebc8f22be75e2593c5d98e2d9ecfd719d5f29').has('type','email_sha256_lowercase').valueMap(true)
==>[dpts_678:[1595548800],label:vertex,id:201523209257056,id:[19df651e-90d5-47f6-af2e-35dcb59bcc0a],type:[id_mid_10],soft_del:[false],country_GBR:[678]]
Could you please have a look?
On Mar 22, 2021, at 8:28 PM, Vinayak Bali <vinayakbali16@...> wrote:Hi All,Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time.1) Turn on query.batch
2) Turn off query.fast-propertyCount query is faster but edge traversal becomes more expensive.Is there any other way to improve count performance without affecting other queries.Thanks & Regards,VinayakOn Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:Hi Vinayak,Try below. If it works for you, you can add E2 and D similarly.g.V().has('property1', 'A').
outE().has('property1', 'E').as('e').inV().has('property1', 'B').outE().has('property1', 'E1').as('e').where (inV().has('property1', 'C')).select (all, 'e').fold().project('edgeCount', 'vertexCount').by(count(local)).by(unfold().bothV().dedup().count())Regards,AmiyaOn Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:Amiya - I need to check the data, there is some mismatch with the counts.Consider we have more than one relation to get the count. How can we modify the query?For example:A->E->B query is as follows:g.V().has('property1', 'A').
outE().has('property1','E').
where(inV().has('property1', 'B')). fold().
project('edgeCount', 'vertexCount').by(count(local)).by(unfold().bothV().dedup().count())A->E->B->E1->C->E2->DWhat changes can be made in the query ??ThanksOn Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:Hi Vinayak,
Correct vertex count is ( 400332 non-unique, 34693 unique).
g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A might be getting included in count in your second query because of eager evaluation (does not matter they have outE with property1 = E or not)
Regards,
Amiya
2) Turn off query.fast-property
Hi Vinayak,Try below. If it works for you, you can add E2 and D similarly.g.V().has('property1', 'A').
outE().has('property1', 'E').as('e').inV().has('property1', 'B').outE().has('property1', 'E1').as('e').where (inV().has('property1', 'C')).select (all, 'e').fold().project('edgeCount', 'vertexCount').by(count(local)).by(unfold().bothV().dedup().count())Regards,AmiyaOn Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:Amiya - I need to check the data, there is some mismatch with the counts.Consider we have more than one relation to get the count. How can we modify the query?For example:A->E->B query is as follows:g.V().has('property1', 'A').
outE().has('property1','E').
where(inV().has('property1', 'B')). fold().
project('edgeCount', 'vertexCount').by(count(local)).by(unfold().bothV().dedup().count())A->E->B->E1->C->E2->DWhat changes can be made in the query ??ThanksOn Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:Hi Vinayak,
Correct vertex count is ( 400332 non-unique, 34693 unique).
g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A might be getting included in count in your second query because of eager evaluation (does not matter they have outE with property1 = E or not)
Regards,
Amiya
The issue still persists, and the vertex metadata is still missing for some vertices, after enabling https://docs.janusgraph.org/advanced-topics/eventual-consistency/, has someone seen the same issue.
The issue is logged at https://github.com/JanusGraph/janusgraph/issues/2515
Thanks