Re: FoundationDB adapter working status
Hi cedtwo!
The Janusgraph-FDB adapter was never officially supported because it has not left the early stages of development. Although there is some interest in the community, it appears no one is actively maintaining this repo and aligning it with the changes made in JanusGraph. Therefore, things can break when major changes in JanusGraph occur. The latest JanusGraph version which I used the FDB-adapter with was 0.4, so that should still work. The update to version 0.5.2 was in this PR. That said, I can not recommend using FDB in a production environment yet, because as you experienced, there is very limited support. Best regards, Florian |
||||||||
|
||||||||
Re: P.neq() predicate uses wrong ES mapping
hadoopmarc@...
Hi Sergej,
The example string "??" you used was not an ordinary string. Apparently, somewhere in elasticsearch it is interpreted as a wildcard. See my transcript below with some other property value and the index behaves according to your and my expectations. I made some attempts to escape the question marks in your example string like "\\?, but was not successful. The janusgraph documentation is very quiet on the use of wildcards for indexing backends. Best wishes, Marc bin/janusgraph.sh start bin/gremlin.sh graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties') mgmt = graph.openManagement() index = mgmt.buildIndex("indexname", Vertex.class) xproperty = mgmt.makePropertyKey("x").dataType(String.class).make(); yproperty = mgmt.makePropertyKey("y").dataType(String.class).make(); index.addKey(xproperty, Mapping.TEXTSTRING.asParameter()) index.addKey(yproperty, Mapping.TEXTSTRING.asParameter()) index.buildMixedIndex("search") mgmt.commit() ManagementSystem.awaitGraphIndexStatus(graph, 'indexname').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call() ==>GraphIndexStatusReport[success=true, indexName='indexname', targetStatus=[REGISTERED, ENABLED], notConverged={}, converged={x=ENABLED, y=ENABLED}, elapsed=PT0.017S] g = graph.traversal() g.addV('Some').property('x', 'x1').property('y', 'y1') g.addV('Some').property('x', 'x2').property('y', '??') g.tx().commit() Expected behaviour: g.V().has("x","x1").has("y",P.neq("y1")) ===> g.V().has("x","x1").has("y",P.eq("y1")) ==>v[4224] g.V().has("x","x1").has("y",P.neq("y4")) ==>v[4224] Undocumented behaviour: g.V().has("x","x2").has("y",P.neq("??")) ==>v[4264] g.V().has("x","x2").has("y",P.eq("??")) ==>v[4264] g.V().has("x","x2").has("y",P.neq("y4")) ==>v[4264] |
||||||||
|
||||||||
FoundationDB adapter working status
cedtwo
Hi. Back in January I opened an issue on the FDB adapter github page to bring attention to issues I had following the README.md. Despite framing the question as issues following the documentation, I was left feeling that the adapter was just not functional with the versions of Janusgraph/FDB stated in the compatibility matrix. I put off working on this side of the app for the last few months, hoping a response would eventually assist in resolving the issues, however coming back I find I have yet to receive one, and the adapter remains just as I left it. Can anyone clarify if the adapter is still supported and functional? Or should I consider another storage backend?
Thanks guys. |
||||||||
|
||||||||
Re: Best way to load exported medium-sized graphs
carlos.bobed@...
Hi Marc,
from the gremlin console, I get: .with(IO.reader, IO.graphml).read().iterate()] - Batch too large
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Batch too large at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:184) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:110) at org.apache.tinkerpop.gremlin.console.Console$_executeInShell_closure19.doCall(Console.groovy:419) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at groovy.lang.Closure.call(Closure.java:405) at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2246) at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2226) at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2276) at org.codehaus.groovy.runtime.dgm$199.doMethodInvoke(Unknown Source) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.executeInShell(Console.groovy:396) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:163) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502) While getting in the server log, on the one side a lot of warnings such as: 1692178 [JanusGraph Cluster-nio-worker-1] WARN com.datastax.driver.core.RequestHandler - Query '[20 stateme
nts, 80 bound values] BEGIN UNLOGGED BATCH INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:key, :column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:ke y,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (: key,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:key,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column... [truncated output]' generated server side warning(s): Batch of prepared statements for [janusgraph.edgestore] is of siz e 7626, exceeding specified threshold of 5120 by 2506. And finally an exception about a temporary failure in storage backend: 1692180 [gremlin-server-session-1] INFO org.janusgraph.diskstorage.util.BackendOperation - Temporary except
ion during backend operation [CacheMutation]. Attempting backoff retry. 42047 org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend 42048 at io.vavr.API$Match$Case0.apply(API.java:3174) 42049 at io.vavr.API$Match.of(API.java:3137) 42050 at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java: 123) 42051 at org.janusgraph.diskstorage.cql.CQLStoreManager.mutateManyUnlogged(CQLStoreManager.java:526) 42052 at org.janusgraph.diskstorage.cql.CQLStoreManager.mutateMany(CQLStoreManager.java:457) 42053 at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(Expe ctedValueCheckingStoreManager.java:79) 42054 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:94) 42055 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91) 42056 at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68) 42057 at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54) 42058 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91) 42059 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.ja va:133) 42060 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.mutate(CacheTransaction.java:86) 42061 at org.janusgraph.diskstorage.keycolumnvalue.cache.KCVSCache.mutateEntries(KCVSCache.java:65) 42062 at org.janusgraph.diskstorage.BackendTransaction.mutateEdges(BackendTransaction.java:200) 42063 at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:628) 42064 at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:731) 42065 at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1438) 42066 at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlu eprintsGraph.java:297) 42067 at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:10 4) 42068 at org.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.readGraph(GraphMLReader.java:132) 42069 at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.IoStep.read(IoStep.java:132) 42070 at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.IoStep.processNextStart(IoStep.java :110) 42071 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:14 3) 42072 at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableSte pIterator.java:50) 42073 at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep. java:37) 42074 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128) 42075 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38) 42076 at org.apache.tinkerpop.gremlin.process.traversal.Traversal.iterate(Traversal.java:207) ... caused by: 42095 Caused by: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.InvalidQueryException
: Batch too large 42096 at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) 42097 at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) 42098 at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 42099 at io.vavr.control.Try.of(Try.java:62) 42100 at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199) 42101 ... 5 more 42102 Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large 42103 at com.datastax.driver.core.Responses$Error.asException(Responses.java:181) 42104 at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:215) 42105 at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235) 42106 at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:61) 42107 at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:10 11) 42108 at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:814) 42109 at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1262) 42110 at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1180) 42111 at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) 42112 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.jav a:377) I'll keep the logs this tiem just in case more detail is required. Best, Carlos |
||||||||
|
||||||||
Re: P.neq() predicate uses wrong ES mapping
sergeymetallic@...
Hi Marc,
something like this var index = janusGraphManagement. |
||||||||
|
||||||||
Re: Best way to load exported medium-sized graphs
hadoopmarc@...
Hi Carlos,
I read the preceding discussion with Stephen Mallette, which says: From the logs, while loading this graph, Cassandra driver is almost always warning that all batches are over the limit of 5120 (which I haven't found yet where to modify ...). A complete stacktrace would help indeed, but it strikes me that 5120 equals 20 x 256 while the cassandra cql drivers have the following defaults:
|
||||||||
|
||||||||
Best way to load exported medium-sized graphs
cbobed <cbobed@...>
Hi all,
I'm trying to load a graphml export into janusgraph 0.5.3. not quite big (1.68M nodes, 8.8M edges). However, I reach a point where Tinkerpop layer tells me that the Batch is too large and it crashes (I suspect that there might be an edge with far too much information ... but finding it is difficult). I've tried to split the graphml file into non-overlapping partitions, but janusgraph does not seem to honor the ID (I'm not actually sure whether this might be omitted at Tinkerpop or Janusgraph level), and when I reach the splitted edges part, it inserts new nodes for both source and target. Has anyone faced this graphml loading problem before? Should I try to get a GraphSON translated version of my exported graph to run everything more smoothly? What are the recommendations to deal with this kind of loadings? I'm trying to avoid implementing my own graph loader at this point. Thank you very much in advance, Best, Carlos Bobed |
||||||||
|
||||||||
Re: P.neq() predicate uses wrong ES mapping
hadoopmarc@...
Hi Sergey,
I think I see your point, but for completeness can you be explicit on step 1) and specify your mgmt.buildIndex() statements? Best wishes, Marc |
||||||||
|
||||||||
Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated
hadoopmarc@...
Hi Toom, |
||||||||
|
||||||||
Re: JanusGraph-specific Predicates via Gremlin Language Variants (Python)
Florian Hockmann
Hi, the problem is probably simply that the Python driver doesn’t support serialization for JanusGraph predicates with GraphBinary. So, you would need to write your GraphBinary serializer for JanusGraph predicates in Python. janusgraph-python could provide support for this in the future but right now it only supports GraphSON and even that is not in a final state right now.
There is currently not much progress in janusgraph-python. So, if you want to help improve the JanusGraph support in Python, then any contributions would of course be very welcome.
Von: janusgraph-users@... <janusgraph-users@...> Im Auftrag von florian.caesar via lists.lfaidata.foundation
Hi, I'm trying to use JanusGraph's full-text predicates in the gremlin-python client library. Using GraphSON serializer, I can use a predicate with e.g. "textContains" as the operator and it works since JanusGraphIoRegistryV1d0 registers its custom deserializer for P objects: |
||||||||
|
||||||||
Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated
toom@...
Thank you for your reply. In my case I can remove the database and create a new one. But what should I do if I want to retrieve data from a JanusGraph database in which the configured index backend is not available ? Is there any way to disable index backend without instantiating the database ? Or to make the index errors not fatal during the instantiation (in order to change configuration) ? Toom. |
||||||||
|
||||||||
P.neq() predicate uses wrong ES mapping
sergeymetallic@...
Janusgraph setup:
Storage backend: Scylla 3 Indexing backend: Elasticsearch 6 JG version: 0.5.3 Steps to reproduce: 1) Create a vertex with two fields mapped in ES index as TEXTSTRING("x" and "y") 2) Insert a node with values: x="anyvalue", y="??" 3) Execute these queries:
Expected result:
Actual result:
Observation: Looks like the issue is in this line https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/ElasticSearchIndex.java#L959 Code checks for Cmp.EQUAL but not for Cmp.NOT_EQUAL, so that in case of NOT_EQUAL tokenized field is used |
||||||||
|
||||||||
JanusGraph-specific Predicates via Gremlin Language Variants (Python)
florian.caesar
Hi, I'm trying to use JanusGraph's full-text predicates in the gremlin-python client library. Using GraphSON serializer, I can use a predicate with e.g. "textContains" as the operator and it works since JanusGraphIoRegistryV1d0 registers its custom deserializer for P objects:
addDeserializer(P.class, new JanusGraphPDeserializerV2d0()); However, as of v0.5.3, JanusGraph does not register any deserializers for GraphBinary (though that feature is already on the master branch). This means that when I submit the same exact traversal with P("textContains", "string") in graphbinary format I get: org.apache.tinkerpop.gremlin.server.handler.OpSelectorHandler - Invalid OpProcessor requested [null] I presume this is because the "textContains" predicate isn't registered. Weirdly enough, in my Groovy console, the same traversal works fine even though it also uses graphbinary (according to the configuration). There are a couple options here and I don't have enough information on any of them, so I would appreciate input: 1. Figure out what the Groovy console is doing differently and use that in the Python library 2. Use a Docker image from master branch and adapt the Python library to use the new custom JanusgraphP type in graphbinary 3. Use two separate clients with different serializations depending on which traversal I need to run (yuck) Note: I've already tested https://github.com/JanusGraph/janusgraph-python, it does the same thing I do manually and thus only works with GraphSON. |
||||||||
|
||||||||
Re: ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application has been killed. Reason: All masters are unresponsive! Giving up.
hadoopmarc@...
Hi Vinayak,
Your properties file says: spark.master=spark://127.0.0.1:7077 Do you have a spark standalone cluster running? Does the spark master reside on 127.0.0.1 and does it listen on 7077? With spark on localhost, you can also simply take the "read-cql.properties¨ which uses all cores on localhost for running spark executors. Best wishes, Marc |
||||||||
|
||||||||
Re: Count Query Optimisation
hadoopmarc@...
Hi Vinayak,
For other readers, see also this other recent thread. A couple of remarks:
Best wishes, Marc |
||||||||
|
||||||||
ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application has been killed. Reason: All masters are unresponsive! Giving up.
Vinayak Bali
Hi All, gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties') ==>hadoopgraph[cqlinputformat->nulloutputformat] gremlin> g = graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V().has('title','Plant').count() 11:09:18 WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible 11:09:20 WARN org.apache.spark.util.Utils - Your hostname, ip-xx-xx-xx-xx resolves to a loopback address: 127.0.0.1; using xx.xx.xx.xx instead (on interface ens5) 11:09:20 WARN org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address 11:10:25 ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application has been killed. Reason: All masters are unresponsive! Giving up. 11:10:25 WARN org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application ID is not initialized yet. 11:10:25 WARN org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint - Drop UnregisterApplication(null) because has not yet connected to master 11:10:25 WARN org.apache.spark.metrics.MetricsSystem - Stopping a MetricsSystem that is not running 11:10:26 ERROR org.apache.spark.SparkContext - Error initializing SparkContext. java.lang.NullPointerException at org.apache.spark.SparkContext.<init>(SparkContext.scala:560) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 11:10:26 ERROR org.apache.spark.scheduler.AsyncEventQueue - Listener AppStatusListener threw an exception java.lang.NullPointerException at org.apache.spark.status.AppStatusListener.onApplicationEnd(AppStatusListener.scala:157) at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82) java.lang.NullPointerException Type ':help' or ':h' for help. Display stack trace? [yN] Hadoop: 3.3.0 Spark: 2.2.2 Scala: 2.11.2 Janusgraph: 0.5.2 Referred the below documentation for configuration files: Thanks & Regards, Vinayak |
||||||||
|
||||||||
Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated
hadoopmarc@...
Hi,
JanusGraph keeps a record of open instances that sometimes is not updated properly. You can clean it with the methods here: https://docs.janusgraph.org/advanced-topics/recovery/#janusgraph-instance-failure Maybe it is no problem if your graph is dropped entirely, so you can also check: https://docs.janusgraph.org/basics/common-questions/#dropping-a-database After dropping, the graph can be recreated with the right configs. Best wishes, Marc |
||||||||
|
||||||||
How to change GLOBAL_OFFLINE configuration when graph can't be instantiated
toom@...
Hi,
I am in a case where the index backend has been incorrectly configured to Elasticsearch. Now, when I try to instantiate my graph database, I get a "ConnectException: Connection refused", even if I set('index.search.backend', 'lucene') in JanusGraphFactory. The setting index.backend is global offline, I should update it when the graph is instantiated but how can I change it if the instantiation fails ? Thanks |
||||||||
|
||||||||
Count Query Optimisation
Vinayak Bali
Hi All, The Data Model of the graph is as follows: Nodes: Label: Node1, count: 130K Label: Node2, count: 183K Label: Node3, count: 437K Label: Node4, count: 156 Relations: Node1 to Node2 Label: Edge1, count: 9K Node2 to Node3 Label: Edge2, count: 200K Node2 to Node4 Label: Edge3, count: 71K Node4 to Node3 Label: Edge4, count: 15K Node4 to Node1 Label: Edge5 , count: 1K The Count query used to get vertex and edge count : g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count()) This query takes around 3.5 mins to execute and the output returned is as follows: [{"vertexCount":383633,"edgeCount":200166}] The problem is traversing the edges takes more time. g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes. g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges In some cases, subsequent calls are faster, due to cache usage. I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance? Please help me to improve the performance, count query should not take much time. Janusgraph : 0.5.2 Storage: Cassandra cql The server specification is high and that is not the issue. Thanks & Regards, Vinayak |
||||||||
|
||||||||
Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster
hadoopmarc@...
|
||||||||
|