Re: Queries with negated text predicates fail with lucene
hadoopmarc@...
Hi Toom,
Yes, you are right, this behavior is not 100% consistent. Also, as noted, the documentation regarding text predicates on properties without index is incomplete. Use cases are sparse, though, because on a graph of practical size, working without index is not an option. Finally, improving this in a backward compatible way might prove impossible. Best wishes, Marc
|
|
Re: potential memory leak
Hi ViVek,
I would suggest to upgrade to 0.6.0 version of JanusGraph. It's hard to understand your case from that scope of information you provided. Generally, I would suggest checking if you always close transactions. Even read queries are opening new transactions when your read from JanusGraph. If you are using Cached thread pool for your connections (default option for Tomcat for example), you may potentially open transactions and after a while the relative threads might be evicted from pool but the underlying transaction never closes automatically (only manually). Thus, the leak could potentially happen in this situation. That said, I would simply suggest to analyze your heap dumps to see what exactly happening in your situation. In this case you could potentially find the problem which causes the leak. Best regards, Oleksandr
|
|
Re: Options for Bulk Read/Bulk Export
To add to Mark's suggestions there is also a multiQuery option in janusgraph-core. Notice, it's internal API and not Gremlin. Thus, it might be unavailable to you if you can access JanusGraph internal API for any reason.
If you work with multiQuery like `janusGraph.multiQuery().addAllVertices(yourVertices).properties()` then make sure you transaction cache is at least the size of `yourVertices.size()`. Otherwise, additional calls might be executed to your backend which could be not as efficient. Best regards, Oleksandr
|
|
Re: Options for Bulk Read/Bulk Export
hadoopmarc@...
Hi,
There are three solution directions:
Best wishes, Marc
|
|
Options for Bulk Read/Bulk Export
subbu165@...
Hi There, we have Janus-Graph with back-end store Foundation DB and index back-end as Elastic Search. Please let me know what is the best way to export/read Millions of Records from JaunusGraph by keeping performance in mind. We don't have the option of using Spark in our environment. I have seen 100s of articles on Bulk Loading but not bulk export/Read. Any suggestion would be of a great help here.
|
|
Re: Queries with negated text predicates fail with lucene
toom@...
Hi Marc,
IMHO, an index should not prevent a query to work. Moreover the result of a query should not depends of backends (storage and index). If an index backend cannot process a predicate, the predicate should be be executed as if index wasn't present. To clarify, below is a sample of code. The same query works without index (line 13) and fails with index (line 31). 1 // create schema
2 mgmt = graph.openManagement()
3 mgmt.makePropertyKey('string').dataType(String.class).cardinality(Cardinality.SINGLE).make()
4 mgmt.makeVertexLabel('data').make()
5 mgmt.commit()
6
7 // add data
8 g.addV('data').property('string', 'foo')
9 ==>v[4120]
10 g.addV('data').property('string', 'bar')
11 ==>v[4312]
12
13 g.V().hasLabel('data').has('string', textNotContains('bar'))
14 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(~label = data AND string textNotContains bar)]. For better performance, use indexes
15 ==>v[4120]
16
17 // add indexe with lucene backend
18 mgmt = graph.openManagement()
19 string = mgmt.getPropertyKey("string")
20 mgmt.buildIndex('myindex', Vertex.class).addKey(string, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search")
21 mgmt.commit()
22
23 // Wait the indexes
24 ManagementSystem.awaitGraphIndexStatus(graph, 'myindex').call()
25
26 // Reindex data
27 mgmt = graph.openManagement()
28 mgmt.updateIndex(mgmt.getGraphIndex("myindex"), SchemaAction.REINDEX).get()
29 mgmt.commit()
30
31 g.V().hasLabel('data').has('string', textNotContains('bar'))
32 Could not call index
Regards, Toom.
|
|
Re: Potential transaction issue (JG 0.6.0)
Charles <dostanian@...>
I have also encountered this problem, but I have found a reliable way to reproduce. The following query works perfectly in the gremlin console (both server and client from Janusgraph 0.6.0 distribution) gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(10).elementMap("workerId")
gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(0).limit(10).elementMap("workerId")
In Java this query fails with the same exception as in your trace, if I pass offset = 0 (zero) it seems to work sometimes fiddling with the offset (3,5,10 and so on) return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId) To make the query succeed I have to either remove the skip() and limit() or order the results before skipping and limiting i.e. return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId) Quite a number of my queries depends on this style of code, start with a known node, traverse edges, has clause and then skip and limit.
9674 [gremlin-server-exec-2] WARN org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor - Exception processing a Traversal on iteration for request [6be9a75f-fa5d-4ef1-a61e-7d133bca33c8].
2021-10-15T21:32:09.932504000Z java.lang.NullPointerException
2021-10-15T21:32:09.932569000Z at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
2021-10-15T21:32:09.932611000Z at org.janusgraph.graphdb.query.vertex.VertexLongList.get(VertexLongList.java:72)
2021-10-15T21:32:09.932652000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:144)
2021-10-15T21:32:09.932697000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:131)
2021-10-15T21:32:09.932734000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:45)
2021-10-15T21:32:09.932774000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932812000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932852000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932890000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932927000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932961000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932995000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933033000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.933068000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
2021-10-15T21:32:09.933109000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933143000Z at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
2021-10-15T21:32:09.933188000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
2021-10-15T21:32:09.933228000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
2021-10-15T21:32:09.933265000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
2021-10-15T21:32:09.933299000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor
|
|
Re: Queries with negated text predicates fail with lucene
hadoopmarc@...
Hi Toom,
See, https://docs.janusgraph.org/index-backend/text-search/#full-text-search_1 Indeed, the negative text predicates are only available to Elasticsearch (and, apparently as you say, to the CompositeIndex). Best wishes, Marc
|
|
Queries with negated text predicates fail with lucene
toom@...
Hi,
With JanusGraph 0.6.0 and Lucene index backend, queries fail if they contain predicate like textNotPrefix, textNotContains: java.lang.IllegalArgumentException: Relation is not supported for string value: textNotPrefix
at org.janusgraph.diskstorage.lucene.LuceneIndex.convertQuery(LuceneIndex.java:814)
at org.janusgraph.diskstorage.lucene.LuceneIndex.convertQuery(LuceneIndex.java:864)
at org.janusgraph.diskstorage.lucene.LuceneIndex.query(LuceneIndex.java:593)
at org.janusgraph.diskstorage.indexing.IndexTransaction.queryStream(IndexTransaction.java:110)
If ElasticSearch is used or if there is no index backend, the same query work. I'm not sure Lucene index can be used for negated queries but the queries should not fails. How can I transform my query to make it work ? Regards, Toom.
|
|
potential memory leak
Vivek Singh Raghuwanshi
Hi Team, We are facing some issues with the Janusgraph 0.5.3, we captured heap dumps and find memory leaks. Can you please check if this is a leak suspect? ViVek Raghuwanshi Mobile +1-847-848-7388 Google Number +1-707-847-8481 http://in.linkedin.com/in/vivekraghuwanshi
|
|
Re: Query performance with range
hadoopmarc@...
Hi Claudio,
Paging with range can only work with a vertex centric index, otherwise the vertex table is scanned for every page. If you just want all results, the alternative is to forget about the range() step and iterate over the query result. Marc
|
|
Re: Thread goes into Waiting state forever
hadoopmarc@...
Hi Tanroop,
Does the problem also occur if you replace v() with V().limit(1) inside your query? If not, at what result size does your issue start to occur? Btw, your post has a typo: "user" and "uidx" should be the same, but I assume it stems from anonymizing your query. Marc
|
|
Query performance with range
Claudio Fumagalli
Hi,
I have some performance issue extacting nodes attached to a node with pagination I have a simple graph with CompositeIndex on property name (Please find schema definition in attachments). The graph has three a 3 genre nodes:
Our goal is given a genre to extract the 20K movies attached to "action" that has value="a", this should be done iteratively limiting the chunk of data extracted at each execution (e.g. We paginate the query using range). I'm using janus 0.6.0 with cassandra 3.11.6. Please find attached the docker compose I've used to create the janus+cassandra environment and also the janus and gremlin configurations. This is the query that we use to extract a page: g.V().has("name", "action").to(Direction.IN, "has_genre").has("value", "a").range(skip, limit).valueMap("id").next(); Here the results of the extraction with different page size:
This is the profile of the query for the last chunk:
gremlin> g.V().has("name", "action").to(Direction.IN, "has_genre").has("value", "a").range(19900, 20000).valueMap("id").profile();
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(action)]) 1 1 0.904 0.01
constructGraphCentricQuery 0.169
GraphCentricQuery 0.591
\_condition=(name = action)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]
\_index=ByName
backend-query 1 0.460
\_query=ByName:multiKSQ[1]
JanusGraphMultiQueryStep 1 1 0.059 0.00
JanusGraphVertexStep(IN,[has_genre],vertex) 40000 40000 240.729 2.31
\_condition=type[has_genre]
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=has_genre:SliceQuery[0x71E1,0x71E2)
\_multi=true
\_vertices=1
optimization 0.019
backend-query 40000 86.565
\_query=has_genre:SliceQuery[0x71E1,0x71E2)
HasStep([value.eq(a)]) 20000 20000 10157.815 97.51
RangeGlobalStep(19900,20000) 100 100 15.166 0.15
PropertyMapStep([id],value) 100 100 2.791 0.03
>TOTAL - - 10417.467 -
It seems that the condition has("value", "a") is evaluated reading each of the attached nodes one by one and than evaluating the filter, is this the expected behaviour and performance? Is there any possible optimization in the interaction between Janus and Cassandra (For example read attached nodes in bulk)? We have verified that activating db-cache (cache.db-cache=true) has a huge impact on perfomance but this is not easly applicable on our real scenario because we have multiple janus nodes (to support the scaling of the system) and with the cache active we have the risk of read stale data (The data are updated frequently and changes must be read by other services in our processing pipeline). Thank you
|
|
Re: GraphTraversal Thread Stuck
ssbothe3@...
Hi So will prefer to have a safety net of executor service which will prevent too many parallel call to CQL driver. And looks like someone also faced similar issue.
Thanks,
|
|
Re: Thread goes into Waiting state forever
ssbothe3@...
Hi Tanroop,
|
|
Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue
hadoopmarc@...
Hi Pawan,
OK, I did a small upgrade session myself and did not encounter any issues (apart from having to set numTokens: 4 in conf/cassandra.yaml). This is what I did:
Best wishes, Marc
|
|
Re: JanusGraph custom types
schwartz@...
I've seen this - https://docs.janusgraph.org/advanced-topics/serializer/
But an example would greatly help (btw, we develop mostly in Python).
|
|
JanusGraph custom types
Hi!
Is there a way to add custom property types to JanusGraph (similar to RelationIdentifier / Geoshape) to be later used by the GraphSON serializer? Ideally without re-building JanusGraph. Due to some downstream limitations, we sometimes can't serialize Python datetime objects. It would be great if we could somehow add a custom step such that .values("date-field").as_iso() would return the field in ISO 8601. Many thanks!
|
|
Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue
Pawan Shriwas
Hi Marc, I have also tried with "graph.allow-upgrade=true" and added "storage.cql.request-timeout" as well but am still facing the same issue. Please check the attached janusgraph server log for your reference. janusgraph version 0.6.0 Cassandra details - Cassandra 3.9 Snapshot | CQL 3.4.2 Thanks, Pawan
On Fri, Oct 8, 2021 at 8:52 PM <hadoopmarc@...> wrote: Hi Pawan, --
Thanks & Regard PAWAN SHRIWAS
|
|
Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue
hadoopmarc@...
Hi Pawan,
It seems I miss the practical experience, but it seems you have to set graph.allow-upgrade=true (although it is not clear whether this applies to all version changes or only to "storage version" changes) https://docs.janusgraph.org/v0.6/configs/configuration-reference/ If this is the solution, the error message is certainly not helpful and the upgrade docs incomplete. Best wishes, Marc
|
|