Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
Hi Anton,
No, my last post only concerned the gremlin server on port 8185, although the first line of step3 should have been (This was a hand edit error): :remote connect tinkerpop.server conf/remote8185.yaml session The gremlin server on port 8182 from janusgraph.sh is ignored. Anyway, the link to the succesful test on github actually held the key to some more insight. It turns out that our issue (bindings are not automatically generated after max 20 seconds) is absent if you use the sequence createTemplateConfiguration() and create(). Unfortunately, this only holds on the same server where the new configuration was created. So, I will report this all as an issue and you can comment on it if necessary. Best wishes, Marc
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
Anton Eroshenko <erosh.anton@...>
Hi Mark, I'm glad that you managed to reproduce it in the Gremlin Console. But I believe that in fact you do it with two JanusGraph servers, not with a single server as you assumed. As far as I understand janusgraph.sh in step 1 and gremlin-server.sh in step 2 are both starting a JanusGraph instance. So I think your test scenario is close to multi-node configuration. That's why a single node test you mentioned could not catch this issue. For single node it works fine. So should I file an issue in the project Github?
|
|
Re: Count Query Optimization
Vinayak Bali
Hi All, query.batch = true AND query.fast-property = true this doesn't work. facing the same problem. Is there any other way?? Thanks & Regards, Vinayak
On Mon, Mar 22, 2021 at 6:06 PM Boxuan Li <liboxuan@...> wrote:
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
You could also check the scenario at line 65 of:
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-server/src/test/java/org/janusgraph/graphdb/tinkerpop/ConfigurationManagementGraphServerTest.java This is with the inmemory storage backend rather than cassandra. Marc
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
Hi Anton,
OK, it took me some time to reach your level of understanding, but hopefully the scenario below really starts adding to our common understanding. While the issue hurts you in a setup with multiple gremlin servers, the issue already appears in a setup with a single gremlin server. The scenario comprises the following steps: 1. start Cassandra with: $ bin/janusgraph.sh start 2. start gremlin server: $ bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration8185.yaml 3. connect with a gremlin console and run the following commands: gremlin> :remote connect tinkerpop.server conf/remote.yaml session ==>Configured localhost/127.0.0.1:8185-[70e1320f-5c24-4804-9851-cc59db23e78e] gremlin> :remote console ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[70e1320f-5c24-4804-9851-cc59db23e78e] - type ':remote console' to return to local mode gremlin> map = new HashMap<String, Object>(); gremlin> map.put("storage.backend", "cql"); ==>null gremlin> map.put("storage.hostname", "127.0.0.1"); ==>null gremlin> map.put("graph.graphname", "graph6"); ==>null gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map)); ==>null ... wait > 20 seconds ... new remote connection required for bindings to take effect gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session ==>Configured localhost/127.0.0.1:8185-[a1ddd2f3-9ab3-4eee-a415-1aa4ea57ca66] gremlin> graph6 No such property: graph6 for class: Script8 Type ':help' or ':h' for help. Display stack trace? [yN]n gremlin> ConfiguredGraphFactory.getGraphNames() ==>graph5 ==>graph4 ==>graph3 ==>graph2 ==>graph1 ==>graph6 gremlin> If you now restart the gremlin server and reconnect in gremlin console, graph6 is opened on the server and available as binding in the console. So, indeed the automatic opening + binding of graphs as intended in line 105 of https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/management/JanusGraphManager.java is somehow not functional. Did we formulate the issue as succinct as possible now? Best wishes, Marc
|
|
Re: Poor performance for some simple queries - bigtable/hbase
Boxuan Li
Hi, > 1. Is this behavior expected, or it's just bigtable or hbase that might have this issue? This (very likely) is not related to bigtable/hbase, but JanusGraph itself. > 2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect? Expected behavior is, it tries to batch (real implementations might depend on the storage backend you use, but at least for CQL, JanusGraph uses a threadpool to fire the backend queries concurrently) the backend queries if possible. Yes, I think the poor performance you observed should be due to query.batch not taking effect. Usually this means batch optimization for that kind of query/scenario is missing. It’s not technically impossible - it’s just areas that need to be worked on. For example, values() step can leverage batching while valueMap() step cannot. We have an open issue for this: #2444. > 3. Any suggestions that I can try to improve this will be greatly appreciated. 1. The best way is to help JanusGraph source code improve on this area and contribute back to the community :P In case you are interested, a good starting point is to read JanusGraphLocalQueryOptimizerStrategy. 2. In some cases, you could split your single traversal into multiple steps and do batching (i.e. multi threading) by yourself. In your second example, you could use BFS and do batching for each level. Hope this helps, Boxuan 「<liqingtaobkd@...>」在 2021年4月1日 週四,上午2:05 寫道:
|
|
Poor performance for some simple queries - bigtable/hbase
liqingtaobkd@...
Hi, We are running janusgraph on GCP with bigtable as the backend. I have observed some query behavior that really confuses me. Basically, I am guessing batch fetching from the backend is not happening for some queries for some reason, though I did set "query.batch" to true. To start, here is my basic query. Basically it tries to trace upstream and find a subgraph. Query 1: find 20 levels subgraph. performance is good. g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).times(20) Query 2: find until the no incoming edges. performance is NOT good. g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo')).until(inE().count().is(0)) Query 3: add a vertex property filter. performance is NOT good. g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').has('type', 'column')).times(20) Query 4: instead of vertex property filter, get back the values of the property and then filter. performance is good. g.V().has('node', 'fqn', 'xxxx').out('contains').repeat(__.in('flowsTo').as('a').values('type').is('column').select('a')).times(20)
Moreover, if I put something like “map”, “group”, “project”, the performance is also poor. So I'm looking for some help here: 1. Is this behavior expected, or it's just bigtable or hbase that might have this issue? 2. What is the expected behavior of "query.batch"? Does the behavior that I observe mean that my "query.batch" is not taking effect? 3. Any suggestions that I can try to improve this will be greatly appreciated. janusgraph.properties: gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend: hbase storage.directory: null storage.hbase.ext.google.bigtable.instance.id: my-bigtable-id storage.hbase.ext.google.bigtable.project.id: my-project-id storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection index.search.backend: elasticsearch index.search.hostname: elasticsearch-master index.search.directory: null cache.db-cache: true cache.db-cache-clean-wait: 20 cache.db-cache-time: 600000 cache.db-cache-size: 0.2 ids.block-size: 100000 ids.renew-percentage: 0.3 query.batch: true query.batch-property-prefetch: true metrics.enabled: false gremlin-server.yaml: host: 0.0.0.0 port: 8182 threadPoolWorker: 3 gremlinPool: 64 scriptEvaluationTimeout: "300000000" channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer graphs: { graph: /etc/opt/janusgraph/janusgraph.properties } scriptEngines: { gremlin-groovy: { plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {}, org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {}, org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}, org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]}, org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/init.groovy]}}}} serializers: - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }} processors: - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000, maxParameters: 256 }} - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }} - { className: org.apache.tinkerpop.gremlin.server.op.standard.StandardOpProcessor, config: { maxParameters: 256 }} metrics: { consoleReporter: {enabled: true, interval: 180000}, csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv}, jmxReporter: {enabled: true}, slf4jReporter: {enabled: true, interval: 180000}, gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST}, graphiteReporter: {enabled: false, interval: 180000}} maxInitialLineLength: 4096 maxHeaderSize: 8192 maxChunkSize: 8192 maxContentLength: 10000000 maxAccumulationBufferComponents: 1024 resultIterationBatchSize: 64 writeBufferLowWaterMark: 32768 writeBufferHighWaterMark: 65536
|
|
Re: Janusgraph 0.5.3 potential memory leak
Boxuan Li
FYI: we recently pushed a bug fix https://github.com/JanusGraph/janusgraph/pull/2536 which might be related to the problem you encountered. This will be released in 0.6.0.
toggle quoted messageShow quoted text
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
Anton Eroshenko <erosh.anton@...>
Marc, thanks for your help. The way you test it is similar to how it works in my environment. I do ConfiguredGraphFactory.open("graph1") as a workaround for the second JanusGraph instance. But the question is about this statement in documentation: The JanusGraphManager rebinds every graph stored on the ConfigurationManagementGraph (or those for which you have created configurations) every 20 seconds. This means your graph and traversal bindings for graphs created using the ConfiguredGraphFactory will be available on all JanusGraph nodes with a maximum of a 20 second lag. It also means that a binding will still be available on a node after a server restart. So I'm expecting that after 20 seconds the new graph traversal will be binded in all JanusGraph nodes without explicitly opening the graph with ConfiguredGraphFactory.open() for each node. I saw in JanusGraphManager the code responsible for this dynamic rebinding, but it doesn't seem to work.
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
Hi Anton,
I did not feel like debugging your docker-compose file, but I could not find any test covering your scenario on github/janusgraph either, so I just replayed your scenario with the default janusgraph-full-0.5.3 distribution. These are the steps:
gremlin> :remote connect tinkerpop.server conf/remote8185.yaml session ==>Configured localhost/127.0.0.1:8185-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c] gremlin> :remote console ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8185]-[3aa66b8e-8468-4cd7-95aa-0e642bb8434c] - type ':remote console' to return to local mode gremlin> map = new HashMap<String, Object>(); gremlin> map.put("storage.backend", "cql"); ==>null gremlin> map.put("storage.hostname", "127.0.0.1"); ==>null gremlin> map.put("graph.graphname", "graph1"); ==>null gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map)); ==>null gremlin> graph1 = ConfiguredGraphFactory.open("graph1") ==>standardjanusgraph[cql:[127.0.0.1]] gremlin> g1 = graph1.traversal() ==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard] gremlin> g1.addV() ==>v[4136] gremlin> g1.V() ==>v[4136] gremlin> g1.tx().commit() ==>null gremlin> In the second console: gremlin> :remote connect tinkerpop.server conf/remote8186.yaml session ==>Configured localhost/127.0.0.1:8186-[00729ace-48e0-4896-83e6-2aeb19abe84d] gremlin> :remote console ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8186]-[00729ace-48e0-4896-83e6-2aeb19abe84d] - type ':remote console' to return to local mode gremlin> graph2 = ConfiguredGraphFactory.open("graph2") Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API. Type ':help' or ':h' for help. Display stack trace? [yN]n gremlin> graph1 = ConfiguredGraphFactory.open("graph1") ==>standardjanusgraph[cql:[127.0.0.1]] gremlin> g1=graph1.traversal() ==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard] gremlin> g1.V() ==>v[4136] The assignment to graph1 differs from what is shown in the ref docs at: https://docs.janusgraph.org/basics/configured-graph-factory/#binding-example But otherwise the scenario you are looking for works as expected. I trust you can use it as a reference for debugging your docker-compose file. Best wishes, Marc
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
Anton Eroshenko <erosh.anton@...>
Hi Marc, The environment properties in docker-compose are making it work with scylla as a backend storage and with ConfiguredGraphFactory for dynamically created graphs. It works as expected except the sync issues I described above. I attached our logs during start-up if you'd like to look at it
On Wed, Mar 24, 2021 at 9:20 PM Anton Eroshenko <erosh.anton@...> wrote:
|
|
Re: Janusgraph 0.5.3 potential memory leak
sergeymetallic@...
After rolling back the PR I mentioned in the beginning of the topic we do not experience any issues. Even back then it was not "out of memory", but the process just ate one full core of CPU and never recovered. After all the CPUs are busy we cannot make any more queries/calls to JanusGraph.
|
|
Re: Janusgraph 0.5.3 potential memory leak
Boxuan Li
After understanding more about the context, I feel https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade is not a fair comparison between iterator and iterable versions because it assumes all entries are loaded once in memory, which isn't necessarily true in real-world scenarios where the input is an AsyncResultSet that uses paging.
The benefit of the iterator version is to avoid pre-allocate a huge chunk of memory for the byte array. I found some flaws in it (reported at https://github.com/JanusGraph/janusgraph/issues/2524#issuecomment-808857502) but not sure whether that is the root cause or not. @sergey, do you see any OOM exception when you encounter the issue (JG eats all the memory and becomes unresponsive)? If you could share a heap dump, that would be very helpful as well. Best regards, Boxuan
|
|
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
Hi Anton,
If I do a $ docker run janusgraph/janusgraph:latest the logs show it runs with the berkeleyje backend. If I look at: https://github.com/JanusGraph/janusgraph-docker/blob/master/0.5/Dockerfile and your docker compose file, I can not see how you make your janusgraph containers use the scylla/cql backend. So, check the logs of your janusgraph containers to see what they are running. And, if this was not clear, sharing configured graphs between janusgraph instances is only possible if they share a distributed storage backend. If berkeleyje is used, each janusgraph container has its private storage backend. Best wises, Marc
|
|
Re: Janusgraph 0.5.3 potential memory leak
Boxuan Li
Can someone share how you run the benchmark (like what JMH version and what janusgraph version you are using) provided by @mad? I ran the benchmark on master (f19df6) but I see OOM errors for both iterator and iterable versions. Furthermore, I don't see any OOM report on the final result (JMH simply omits those runs with exceptions in the final report).
My environment:
My dependencies: My benchmark results:
|
|
Re: Duplicate Vertex
Boxuan Li
I couldn't reproduce this on the v0.4 branch using the below code:
|
|
Traversal binding of dynamically created graphs are not propagated in multi-node cluster
Anton Eroshenko <erosh.anton@...>
Hi We use dynamically created graphs in a multi-node JanusGraph cluster. With a single JunusGraph node it seems to work, but when we are using more than one, synchronization between JanusGraph nodes doesn't work, gremlin server on some nodes does not recognize newly created graph traversal. Documentation page says that with a maximum of a 20s lag for the binding to take
effect on any node in the cluster, but in fact the new traversal is binded only on the node we did request to, not on the others, no matter how long you wait. So it looks like a bug. We're creating a new graph with ConfiguredGraphFactory.create(graphName) It is created successfully, but not propagated to other nodes. As a workaround I'm calling ConfiguredGraphFactory.open(graphName) on an unsynced instance, but it is not reliable since from Java application you don't know what instance you will be redirected to by LB. I attached a docker-compose file with which it can be reproduced. There are two JanusGraph instances, they expose different ports. But be aware that two JanusGraph instances starting up at the same time result in concurrency error on one of the nodes, another issue of multi-node configuration. So I simply stop one of the containers on start-up and restart it later.
|
|
Re: Query not returning always the same result
hadoopmarc@...
Hi Adrian,
What happens if you rewrite the query to: lmg.traversal().V(analysisVertex).out().emit().repeat( __.in().choose( __.hasLabel("result"), __.has("analysisId", analysisId), __.identity() ) ).tree().next().getTreesAtDepth(3); I do not understand how leaving out the else clause leads to the random behavior you describe, but it won't hurt to state the intended else clause explicitly. If the else clause is not a valid case in your data model, you do not need the choose() step. Best wishes, Marc
|
|
Query not returning always the same result
Adrián Abalde Méndez <aabalde@...>
Hello, I'm having a strange behaviour with janusgraph and I would like to post it here and see if anyone can give me some help. The thing is that I'm doing a tree query for getting my graph data structured as a tree, and from there build the results I'm interested in. This query works fine, but the problem is that I don't get the same results every time. It doesn't have any sense that, if the graph is the same and hasn't changed, the query returns different trees, does it? Both trees I'm getting are not very different between them. We have a node type called "group", and some other nodes hanging from this "groups" called "results", and is just that some times the tree comes with the results and others not, but it has always the "group" structure. In case you want to know it, the query I'm performing is this one: lmg.traversal().V(analysisVertex).out().emit().repeat( __.in().choose( __.label().is(P.eq("result")), __.where(__.has("analysisId", analysisId)) ) ).tree().next().getTreesAtDepth(3); where starting from an "analysis" node, I filter the graph to just have a tree with the groups and the results with the analysisId I'm interested in. I guess that is not a problem of the query itself, because when it has the results, it works fine. But I don't know why I am getting this strange inconsistent behaviour. Any ideas about this? Thanks in advance :) Best regards, Adrian
|
|