Date   

Re: Options for Bulk Read/Bulk Export

Oleksandr Porunov
 

> Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light

Not sure I fully following but will try to add some more clearance.
- vertex ids are not cleared from `vertex`. So, when you return vertices you simply hold them in your heap but all edges / properties are also managed by internal caches. By default if you return vertices you don't return it's properties / edges.
To return properties for vertices you might use `valueMap`, `properties`, `values` gremlin steps.
In the previous message I wasn't talking about using Gremlin but said about `multiQuery` which is a JanusGraph feature. `multiQuery` may store data in tx-cache if you preload your properties.
To use multiQuery you must provide vertices for which you want to preload properties (think about it as simple vertex ids rather than a collection of all vertex data). After you preload properties they are stored in tx-level cache and also may be stored in db-level cache if you enabled that. After that you can access vertex properties without additional calls to internal database but instead get those properties from tx-level cache.
There is a property `cache.tx-cache-size` which says `Maximum size of the transaction-level cache of recently-used vertices.`. By default it's 20000 but you can configure this individually per transaction when you are creating your transaction.
As you said, you don't have possibility to store 10M vertices in your cache then you need to split your work on different chunks.
Basically something like:
janusGraph.multiQuery().addAllVertices(youFirstMillionVertices).properties().forEach(
// process your vertex properties
);
janusGraph.multiQuery().addAllVertices(youSecondMillionVertices).properties().forEach(
// process your vertex properties.
// As your youFirstMillionVertices are processed it means they will be evicted from tx-level cache because youSecondMillionVertices are now recently-used vertices.
);
janusGraph.multiQuery().addAllVertices(youThirdMillionVertices).properties().forEach(
// process your vertex properties
// As your youSecondMillionVertices are processed it means they will be evicted from tx-level cache because youThirdMillionVertices are now recently-used vertices.
);
// ...

You may also simply close and reopen transactions when you processed some chunk of your data.

Under the hood multiQuery will either you batch db feature or https://docs.janusgraph.org/configs/configuration-reference/#storageparallel-backend-executor-service

In case you are trying to find a good executor service I would suggest to look at scalable executor service like https://github.com/elastic/elasticsearch/blob/dfac67aff0ca126901d72ed7fe862a1e7adb19b0/server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java#L74-L81
or similar executor services. I wouldn't recommend using executor services without upper bound limit like Cached thread pool because they are quite dangerous.

Hope it helps somehow.

Best regards,
Oleksandr


Re: Janusgraph Schema dump

hadoopmarc@...
 

Hi Pawan,

1. See https://docs.janusgraph.org/schema/#displaying-schema-information
2. See the manuals of your storage and indexing backends (and older questions on this list)
3. Please elaborate; somehow your question does not make sense to me

Best wishes,

Marc


Re: Janusgraph Schema dump

Pawan Shriwas
 

adding one more points  

 3. How can we get the mapped properties in a label or all labels


On Thu, Oct 21, 2021 at 7:46 PM Pawan Shriwas <shriwas.pawan@...> wrote:
Hi All,

Is any one let me know how to do below two items in janusgraph.

1. Database Schema dump (Want to use same schema dump on another env using same export) 
2. Database data dump for backup and restore case.

Thanks,
Pawan


--
Thanks & Regard

PAWAN SHRIWAS


Janusgraph Schema dump

Pawan Shriwas
 

Hi All,

Is any one let me know how to do below two items in janusgraph.

1. Database Schema dump (Want to use same schema dump on another env using same export) 
2. Database data dump for backup and restore case.

Thanks,
Pawan


Re: Options for Bulk Read/Bulk Export

subbu165@...
 

So currently we have JanusGraph with the storage back-end as FDB and use ElasticSearch for indexing. 
 
First we get the vertexIDs indexes from Elasticsearch back-end and then below is what we do 
JanusGraph graph = JanusGraphFactory.open(janusConfig);
Vertex vertex = graph.vertices(vertexId).next(); 
 
All the above including getting the vertexid indexes from Elasticsearch happens within the spark context using sparkRDD for partition and parallelisation. If we remove spark out of the equation, what else best way I can do bulkExport?
Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light


Re: Queries with negated text predicates fail with lucene

hadoopmarc@...
 

Hi Toom,

Yes, you are right, this behavior is not 100% consistent. Also, as noted, the documentation regarding text predicates on properties without index is incomplete. Use cases are sparse, though, because on a graph of practical size, working without index is not an option. Finally, improving this in a backward compatible way might prove impossible.

Best wishes,

Marc


Re: potential memory leak

Oleksandr Porunov
 

Hi ViVek,

I would suggest to upgrade to 0.6.0 version of JanusGraph.
It's hard to understand your case from that scope of information you provided. Generally, I would suggest checking if you always close transactions. Even read queries are opening new transactions when your read from JanusGraph. If you are using Cached thread pool for your connections (default option for Tomcat for example), you may potentially open transactions and after a while the relative threads might be evicted from pool but the underlying transaction never closes automatically (only manually). Thus, the leak could potentially happen in this situation. That said, I would simply suggest to analyze your heap dumps to see what exactly happening in your situation. In this case you could potentially find the problem which causes the leak.

Best regards,
Oleksandr


Re: Options for Bulk Read/Bulk Export

Oleksandr Porunov
 

To add to Mark's suggestions there is also a multiQuery option in janusgraph-core. Notice, it's internal API and not Gremlin. Thus, it might be unavailable to you if you can access JanusGraph internal API for any reason.
If you work with multiQuery like `janusGraph.multiQuery().addAllVertices(yourVertices).properties()` then make sure you transaction cache is at least the size of `yourVertices.size()`. Otherwise, additional calls might be executed to your backend which could be not as efficient.

Best regards,
Oleksandr


Re: Options for Bulk Read/Bulk Export

hadoopmarc@...
 

Hi,

There are three solution directions:
  1. if you have keys to your vertices available, either vertex ids or unique values of some vertex property, you can start as many gremlin clients as your backends can handle and distribute the keys over the clients. This is the easy case, but often not applicable.
  2. if there are no keys available, gremlin can only help you with a full table scan g.V(). If you have a client machine with many cores, the withComputer() step, either with or without spark-local, will help you parallelize the scan.
  3. you can copy the vertex files from the storage backend and decode them offline. Decoding procedures are implicit in the janusgraph source code, but I am not aware of any library that does this for you explicitly.
You decide, but I would suggest option 2 with spark-local as the option that works out of the box.

Best wishes,    Marc


Options for Bulk Read/Bulk Export

subbu165@...
 

Hi There, we have Janus-Graph with back-end store Foundation DB and index back-end as Elastic Search. Please let me know what is the best way to export/read Millions of Records from JaunusGraph by keeping performance in mind. We don't have the option of using Spark in our environment.  I have seen 100s of articles on Bulk Loading but not bulk export/Read. Any suggestion would be of a great help here.


Re: Queries with negated text predicates fail with lucene

toom@...
 

Hi Marc,

IMHO, an index should not prevent a query to work. Moreover the result of a query should not depends of backends (storage and index). If an index backend cannot process a predicate, the predicate should be be executed as if index wasn't present.

To clarify, below is a sample of code. The same query works without index (line 13) and fails with index (line 31).
     1  // create schema
     2  mgmt = graph.openManagement()
     3  mgmt.makePropertyKey('string').dataType(String.class).cardinality(Cardinality.SINGLE).make()
     4  mgmt.makeVertexLabel('data').make()
     5  mgmt.commit()
     6
     7  // add data
     8  g.addV('data').property('string', 'foo')
     9  ==>v[4120]
    10  g.addV('data').property('string', 'bar')
    11  ==>v[4312]
    12
    13  g.V().hasLabel('data').has('string', textNotContains('bar'))
    14  WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(~label = data AND string textNotContains bar)]. For better performance, use indexes
    15  ==>v[4120]
    16
    17  // add indexe with lucene backend
    18  mgmt = graph.openManagement()
    19  string = mgmt.getPropertyKey("string")
    20  mgmt.buildIndex('myindex', Vertex.class).addKey(string, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search")
    21  mgmt.commit()
    22
    23  // Wait the indexes
    24  ManagementSystem.awaitGraphIndexStatus(graph, 'myindex').call()
    25
    26  // Reindex data
    27  mgmt = graph.openManagement()
    28  mgmt.updateIndex(mgmt.getGraphIndex("myindex"), SchemaAction.REINDEX).get()
    29  mgmt.commit()
    30
    31  g.V().hasLabel('data').has('string', textNotContains('bar'))
    32  Could not call index

Regards,

Toom.


Re: Potential transaction issue (JG 0.6.0)

Charles
 

I have also encountered this problem, but I have found a reliable way to reproduce.

The following query works perfectly in the gremlin console (both server and client from Janusgraph 0.6.0 distribution)

gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(10).elementMap("workerId")
gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(0).limit(10).elementMap("workerId")

In Java this query fails with the same exception as in your trace, if I pass offset = 0 (zero) it seems to work sometimes fiddling with the offset (3,5,10 and so on)
 
return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);

To make the query succeed I have to either remove the skip() and limit() or order the results before skipping and limiting i.e.

return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.order()
.by(WorkerWrapper.PROP_LAST_NAME).by(WorkerWrapper.PROP_FIRST_NAME)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);
 
Quite a number of my queries depends on this style of code, start with a known node, traverse edges, has clause and then skip and limit.

9674 [gremlin-server-exec-2] WARN  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Exception processing a Traversal on iteration for request [6be9a75f-fa5d-4ef1-a61e-7d133bca33c8].
2021-10-15T21:32:09.932504000Z java.lang.NullPointerException
2021-10-15T21:32:09.932569000Z at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
2021-10-15T21:32:09.932611000Z at org.janusgraph.graphdb.query.vertex.VertexLongList.get(VertexLongList.java:72)
2021-10-15T21:32:09.932652000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:144)
2021-10-15T21:32:09.932697000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:131)
2021-10-15T21:32:09.932734000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:45)
2021-10-15T21:32:09.932774000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932812000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932852000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932890000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932927000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932961000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932995000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933033000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.933068000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
2021-10-15T21:32:09.933109000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933143000Z at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
2021-10-15T21:32:09.933188000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
2021-10-15T21:32:09.933228000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
2021-10-15T21:32:09.933265000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
2021-10-15T21:32:09.933299000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor
 
 


Re: Queries with negated text predicates fail with lucene

hadoopmarc@...
 

Hi Toom,

See, https://docs.janusgraph.org/index-backend/text-search/#full-text-search_1

Indeed, the negative text predicates are only available to Elasticsearch (and, apparently as you say, to the CompositeIndex).

Best wishes,    Marc


Queries with negated text predicates fail with lucene

toom@...
 

Hi,

With JanusGraph 0.6.0 and Lucene index backend, queries fail if they contain predicate like textNotPrefix, textNotContains:
java.lang.IllegalArgumentException: Relation is not supported for string value: textNotPrefix
        at org.janusgraph.diskstorage.lucene.LuceneIndex.convertQuery(LuceneIndex.java:814)
        at org.janusgraph.diskstorage.lucene.LuceneIndex.convertQuery(LuceneIndex.java:864)
        at org.janusgraph.diskstorage.lucene.LuceneIndex.query(LuceneIndex.java:593)
        at org.janusgraph.diskstorage.indexing.IndexTransaction.queryStream(IndexTransaction.java:110)

If ElasticSearch is used or if there is no index backend, the same query work.
I'm not sure Lucene index can be used for negated queries but the queries should not fails. How can I transform my query to make it work ?

Regards,

Toom.


potential memory leak

Vivek Singh Raghuwanshi
 

Hi Team,

We are facing some issues with the Janusgraph 0.5.3, we captured heap dumps and find memory leaks.
Can you please check if this is a leak suspect?
image.png



--
ViVek Raghuwanshi
Mobile +1-847-848-7388
Google Number +1-707-847-8481
http://in.linkedin.com/in/vivekraghuwanshi


Re: Query performance with range

hadoopmarc@...
 

Hi Claudio,

Paging with range can only work with a vertex centric index, otherwise the vertex table is scanned for every page. If you just want all results, the alternative is to forget about the range() step and iterate over the query result.

Marc


Re: Thread goes into Waiting state forever

hadoopmarc@...
 

Hi Tanroop,

Does the problem also occur if you replace v() with V().limit(1) inside your query?

If not, at what result size does your issue start to occur?

Btw, your post has a typo: "user" and "uidx" should be the same, but I assume it stems from anonymizing your query.

Marc


Query performance with range

Claudio Fumagalli
 

Hi,

I have some performance issue extacting nodes attached to a node with pagination  
I have a simple graph with CompositeIndex on property name (Please find schema definition in attachments).
The graph has three a 3 genre nodes:
  • "action" node has 20K attached movies with value="a" and 20K attached movies with value="b".
  • "drama" node has 10K attached movies with value="b"
  • "comedy" node has 10K attached movies with value="c"
Genre nodes has id and name properties, movie nodes has id,name and value properties.

Our goal is given a genre to extract the 20K movies attached to "action" that has value="a", this should be done iteratively limiting the chunk of data extracted at each execution (e.g. We paginate the query using range).

I'm using janus 0.6.0 with cassandra 3.11.6. Please find attached the docker compose I've used to create the janus+cassandra environment and also the janus and gremlin configurations.

This is the query that we use to extract a page:
g.V().has("name", "action").to(Direction.IN, "has_genre").has("value", "a").range(skip, limit).valueMap("id").next();

Here the results of the extraction with different page size:
  • page size 100 read 200 pages in 591453 ms - average elapsed per page 2957.265 ms - min 284 ms - max 11618 ms
  • page size 1000 read 20 pages in 62293 ms - average elapsed per page 3114.65 ms - min 632 ms - max 9712 ms
This is the profile of the query for the last chunk:

gremlin> g.V().has("name", "action").to(Direction.IN, "has_genre").has("value", "a").range(19900, 20000).valueMap("id").profile();
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(action)])                                   1           1           0.904     0.01
  constructGraphCentricQuery                                                                   0.169
  GraphCentricQuery                                                                            0.591
    \_condition=(name = action)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]
    \_index=ByName
    backend-query                                                      1                       0.460
    \_query=ByName:multiKSQ[1]
JanusGraphMultiQueryStep                                               1           1           0.059     0.00
JanusGraphVertexStep(IN,[has_genre],vertex)                        40000       40000         240.729     2.31
    \_condition=type[has_genre]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=has_genre:SliceQuery[0x71E1,0x71E2)
    \_multi=true
    \_vertices=1
  optimization                                                                                 0.019
  backend-query                                                    40000                      86.565
    \_query=has_genre:SliceQuery[0x71E1,0x71E2)
HasStep([value.eq(a)])                                             20000       20000       10157.815    97.51
RangeGlobalStep(19900,20000)                                         100         100          15.166     0.15
PropertyMapStep([id],value)                                          100         100           2.791     0.03
                                            >TOTAL                     -           -       10417.467        -
 

It seems that the condition has("value", "a") is evaluated reading each of the attached nodes one by one and than evaluating the filter, is this the expected behaviour and performance? Is there any possible optimization in the interaction between Janus and Cassandra (For example read attached nodes in bulk)? 

We have verified that activating db-cache (cache.db-cache=true) has a huge impact on perfomance but this is not easly applicable on our real scenario because we have multiple janus nodes (to support the scaling of the system) and with the cache active we have the risk of read stale data (The data are updated frequently and changes must be read by other services in our processing pipeline).

Thank you


Re: GraphTraversal Thread Stuck

ssbothe3@...
 

Hi

You are suggesting above experiment for isolating the issue right ?


I thought about not using the CQL executor service but we are in primary stage have not done any workload tests to figure out the correct CQL driver config params.

So will prefer to have a safety net of executor service which will prevent too many parallel call to CQL driver.


And looks like someone also faced similar issue.

 

https://lists.lfaidata.foundation/g/janusgraph-users/topic/thread_goes_into_waiting/79937111?p=,,,20,0,0,0::recentpostdate/sticky,,,20,0,0,79937111,previd=1634107810504447777,nextid=1630650635690684483&previd=1634107810504447777&nextid=1630650635690684483

Thanks,
Sujay Bothe


Re: Thread goes into Waiting state forever

ssbothe3@...
 

Hi Tanroop,

I have also faced same issue and have posted a query about it on this channel  'GraphTraversal Thread Stuck'.

Did you found the root cause of above issue ?

Thanks,
Sujay Bothe

1 - 20 of 6223