Date   

Re: Options for Bulk Read/Bulk Export

Oleksandr Porunov
 

To add to Mark's suggestions there is also a multiQuery option in janusgraph-core. Notice, it's internal API and not Gremlin. Thus, it might be unavailable to you if you can access JanusGraph internal API for any reason.
If you work with multiQuery like `janusGraph.multiQuery().addAllVertices(yourVertices).properties()` then make sure you transaction cache is at least the size of `yourVertices.size()`. Otherwise, additional calls might be executed to your backend which could be not as efficient.

Best regards,
Oleksandr


Re: Options for Bulk Read/Bulk Export

hadoopmarc@...
 

Hi,

There are three solution directions:
  1. if you have keys to your vertices available, either vertex ids or unique values of some vertex property, you can start as many gremlin clients as your backends can handle and distribute the keys over the clients. This is the easy case, but often not applicable.
  2. if there are no keys available, gremlin can only help you with a full table scan g.V(). If you have a client machine with many cores, the withComputer() step, either with or without spark-local, will help you parallelize the scan.
  3. you can copy the vertex files from the storage backend and decode them offline. Decoding procedures are implicit in the janusgraph source code, but I am not aware of any library that does this for you explicitly.
You decide, but I would suggest option 2 with spark-local as the option that works out of the box.

Best wishes,    Marc


Options for Bulk Read/Bulk Export

subbu165@...
 

Hi There, we have Janus-Graph with back-end store Foundation DB and index back-end as Elastic Search. Please let me know what is the best way to export/read Millions of Records from JaunusGraph by keeping performance in mind. We don't have the option of using Spark in our environment.  I have seen 100s of articles on Bulk Loading but not bulk export/Read. Any suggestion would be of a great help here.


Re: Queries with negated text predicates fail with lucene

toom@...
 

Hi Marc,

IMHO, an index should not prevent a query to work. Moreover the result of a query should not depends of backends (storage and index). If an index backend cannot process a predicate, the predicate should be be executed as if index wasn't present.

To clarify, below is a sample of code. The same query works without index (line 13) and fails with index (line 31).
     1  // create schema
     2  mgmt = graph.openManagement()
     3  mgmt.makePropertyKey('string').dataType(String.class).cardinality(Cardinality.SINGLE).make()
     4  mgmt.makeVertexLabel('data').make()
     5  mgmt.commit()
     6
     7  // add data
     8  g.addV('data').property('string', 'foo')
     9  ==>v[4120]
    10  g.addV('data').property('string', 'bar')
    11  ==>v[4312]
    12
    13  g.V().hasLabel('data').has('string', textNotContains('bar'))
    14  WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(~label = data AND string textNotContains bar)]. For better performance, use indexes
    15  ==>v[4120]
    16
    17  // add indexe with lucene backend
    18  mgmt = graph.openManagement()
    19  string = mgmt.getPropertyKey("string")
    20  mgmt.buildIndex('myindex', Vertex.class).addKey(string, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search")
    21  mgmt.commit()
    22
    23  // Wait the indexes
    24  ManagementSystem.awaitGraphIndexStatus(graph, 'myindex').call()
    25
    26  // Reindex data
    27  mgmt = graph.openManagement()
    28  mgmt.updateIndex(mgmt.getGraphIndex("myindex"), SchemaAction.REINDEX).get()
    29  mgmt.commit()
    30
    31  g.V().hasLabel('data').has('string', textNotContains('bar'))
    32  Could not call index

Regards,

Toom.


Re: Potential transaction issue (JG 0.6.0)

Charles <dostanian@...>
 

I have also encountered this problem, but I have found a reliable way to reproduce.

The following query works perfectly in the gremlin console (both server and client from Janusgraph 0.6.0 distribution)

gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(10).elementMap("workerId")
gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(0).limit(10).elementMap("workerId")

In Java this query fails with the same exception as in your trace, if I pass offset = 0 (zero) it seems to work sometimes fiddling with the offset (3,5,10 and so on)
 
return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);

To make the query succeed I have to either remove the skip() and limit() or order the results before skipping and limiting i.e.

return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.order()
.by(WorkerWrapper.PROP_LAST_NAME).by(WorkerWrapper.PROP_FIRST_NAME)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);
 
Quite a number of my queries depends on this style of code, start with a known node, traverse edges, has clause and then skip and limit.

9674 [gremlin-server-exec-2] WARN  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Exception processing a Traversal on iteration for request [6be9a75f-fa5d-4ef1-a61e-7d133bca33c8].
2021-10-15T21:32:09.932504000Z java.lang.NullPointerException
2021-10-15T21:32:09.932569000Z at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
2021-10-15T21:32:09.932611000Z at org.janusgraph.graphdb.query.vertex.VertexLongList.get(VertexLongList.java:72)
2021-10-15T21:32:09.932652000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:144)
2021-10-15T21:32:09.932697000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:131)
2021-10-15T21:32:09.932734000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:45)
2021-10-15T21:32:09.932774000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932812000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932852000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932890000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932927000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932961000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932995000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933033000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.933068000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
2021-10-15T21:32:09.933109000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933143000Z at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
2021-10-15T21:32:09.933188000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
2021-10-15T21:32:09.933228000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
2021-10-15T21:32:09.933265000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
2021-10-15T21:32:09.933299000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor
 
 


Re: Queries with negated text predicates fail with lucene

hadoopmarc@...
 

Hi Toom,

See, https://docs.janusgraph.org/index-backend/text-search/#full-text-search_1

Indeed, the negative text predicates are only available to Elasticsearch (and, apparently as you say, to the CompositeIndex).

Best wishes,    Marc


Queries with negated text predicates fail with lucene

toom@...
 

Hi,

With JanusGraph 0.6.0 and Lucene index backend, queries fail if they contain predicate like textNotPrefix, textNotContains:
java.lang.IllegalArgumentException: Relation is not supported for string value: textNotPrefix
        at org.janusgraph.diskstorage.lucene.LuceneIndex.convertQuery(LuceneIndex.java:814)
        at org.janusgraph.diskstorage.lucene.LuceneIndex.convertQuery(LuceneIndex.java:864)
        at org.janusgraph.diskstorage.lucene.LuceneIndex.query(LuceneIndex.java:593)
        at org.janusgraph.diskstorage.indexing.IndexTransaction.queryStream(IndexTransaction.java:110)

If ElasticSearch is used or if there is no index backend, the same query work.
I'm not sure Lucene index can be used for negated queries but the queries should not fails. How can I transform my query to make it work ?

Regards,

Toom.


potential memory leak

Vivek Singh Raghuwanshi
 

Hi Team,

We are facing some issues with the Janusgraph 0.5.3, we captured heap dumps and find memory leaks.
Can you please check if this is a leak suspect?
image.png



--
ViVek Raghuwanshi
Mobile +1-847-848-7388
Google Number +1-707-847-8481
http://in.linkedin.com/in/vivekraghuwanshi


Re: Query performance with range

hadoopmarc@...
 

Hi Claudio,

Paging with range can only work with a vertex centric index, otherwise the vertex table is scanned for every page. If you just want all results, the alternative is to forget about the range() step and iterate over the query result.

Marc


Re: Thread goes into Waiting state forever

hadoopmarc@...
 

Hi Tanroop,

Does the problem also occur if you replace v() with V().limit(1) inside your query?

If not, at what result size does your issue start to occur?

Btw, your post has a typo: "user" and "uidx" should be the same, but I assume it stems from anonymizing your query.

Marc


Query performance with range

Claudio Fumagalli
 

Hi,

I have some performance issue extacting nodes attached to a node with pagination  
I have a simple graph with CompositeIndex on property name (Please find schema definition in attachments).
The graph has three a 3 genre nodes:
  • "action" node has 20K attached movies with value="a" and 20K attached movies with value="b".
  • "drama" node has 10K attached movies with value="b"
  • "comedy" node has 10K attached movies with value="c"
Genre nodes has id and name properties, movie nodes has id,name and value properties.

Our goal is given a genre to extract the 20K movies attached to "action" that has value="a", this should be done iteratively limiting the chunk of data extracted at each execution (e.g. We paginate the query using range).

I'm using janus 0.6.0 with cassandra 3.11.6. Please find attached the docker compose I've used to create the janus+cassandra environment and also the janus and gremlin configurations.

This is the query that we use to extract a page:
g.V().has("name", "action").to(Direction.IN, "has_genre").has("value", "a").range(skip, limit).valueMap("id").next();

Here the results of the extraction with different page size:
  • page size 100 read 200 pages in 591453 ms - average elapsed per page 2957.265 ms - min 284 ms - max 11618 ms
  • page size 1000 read 20 pages in 62293 ms - average elapsed per page 3114.65 ms - min 632 ms - max 9712 ms
This is the profile of the query for the last chunk:

gremlin> g.V().has("name", "action").to(Direction.IN, "has_genre").has("value", "a").range(19900, 20000).valueMap("id").profile();
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(action)])                                   1           1           0.904     0.01
  constructGraphCentricQuery                                                                   0.169
  GraphCentricQuery                                                                            0.591
    \_condition=(name = action)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]
    \_index=ByName
    backend-query                                                      1                       0.460
    \_query=ByName:multiKSQ[1]
JanusGraphMultiQueryStep                                               1           1           0.059     0.00
JanusGraphVertexStep(IN,[has_genre],vertex)                        40000       40000         240.729     2.31
    \_condition=type[has_genre]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=has_genre:SliceQuery[0x71E1,0x71E2)
    \_multi=true
    \_vertices=1
  optimization                                                                                 0.019
  backend-query                                                    40000                      86.565
    \_query=has_genre:SliceQuery[0x71E1,0x71E2)
HasStep([value.eq(a)])                                             20000       20000       10157.815    97.51
RangeGlobalStep(19900,20000)                                         100         100          15.166     0.15
PropertyMapStep([id],value)                                          100         100           2.791     0.03
                                            >TOTAL                     -           -       10417.467        -
 

It seems that the condition has("value", "a") is evaluated reading each of the attached nodes one by one and than evaluating the filter, is this the expected behaviour and performance? Is there any possible optimization in the interaction between Janus and Cassandra (For example read attached nodes in bulk)? 

We have verified that activating db-cache (cache.db-cache=true) has a huge impact on perfomance but this is not easly applicable on our real scenario because we have multiple janus nodes (to support the scaling of the system) and with the cache active we have the risk of read stale data (The data are updated frequently and changes must be read by other services in our processing pipeline).

Thank you


Re: GraphTraversal Thread Stuck

ssbothe3@...
 

Hi

You are suggesting above experiment for isolating the issue right ?


I thought about not using the CQL executor service but we are in primary stage have not done any workload tests to figure out the correct CQL driver config params.

So will prefer to have a safety net of executor service which will prevent too many parallel call to CQL driver.


And looks like someone also faced similar issue.

 

https://lists.lfaidata.foundation/g/janusgraph-users/topic/thread_goes_into_waiting/79937111?p=,,,20,0,0,0::recentpostdate/sticky,,,20,0,0,79937111,previd=1634107810504447777,nextid=1630650635690684483&previd=1634107810504447777&nextid=1630650635690684483

Thanks,
Sujay Bothe


Re: Thread goes into Waiting state forever

ssbothe3@...
 

Hi Tanroop,

I have also faced same issue and have posted a query about it on this channel  'GraphTraversal Thread Stuck'.

Did you found the root cause of above issue ?

Thanks,
Sujay Bothe


Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

hadoopmarc@...
 

Hi Pawan,

OK, I did a small upgrade session myself and did not encounter any issues (apart from having to set numTokens: 4 in conf/cassandra.yaml).

This is what I did:
  1. Start from a fresh janusgraph-full-0.5.3 distribution (so, with an empty db folder)
  2. Run bin/janusgraph.sh start
  3. Run bin/gremlin.sh and
    graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')
    GraphOfTheGodsFactory.load(graph)
    :q
  4. Run bin/janusgraph.sh stop
  5. Start from a fresh janusgraph-full-0.6.0 distribution (so, with an empty db folder)
  6. Copy the db folder from janusgraph-full-0.5.3/db to janusgraph-full-0.6.0
  7. Set set numTokens: 4 in conf/cassandra.yaml
  8. Run bin/janusgraph.sh start
  9. Run bin/gremlin.sh and
    graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')
    graph.traversal().V()
Can you reproduce your issue in a similar way?

Best wishes,     Marc


Re: JanusGraph custom types

schwartz@...
 

I've seen this - https://docs.janusgraph.org/advanced-topics/serializer/
But an example would greatly help (btw, we develop mostly in Python).


JanusGraph custom types

schwartz@...
 
Edited

Hi!

Is there a way to add custom property types to JanusGraph (similar to RelationIdentifier / Geoshape) to be later used by the 
GraphSON serializer? Ideally without re-building JanusGraph.

Due to some downstream limitations, we sometimes can't serialize Python datetime objects. It would be great if we could somehow add a custom step such that .values("date-field").as_iso() would return the field in ISO 8601.

Many thanks!


Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

Pawan Shriwas
 

Hi Marc,

I have also tried with "graph.allow-upgrade=true" and added "storage.cql.request-timeout" as well but am still facing the same issue.

Please check the attached janusgraph server log for your reference.

janusgraph version 0.6.0
Cassandra details -   Cassandra 3.9 Snapshot | CQL 3.4.2

Thanks,
Pawan


On Fri, Oct 8, 2021 at 8:52 PM <hadoopmarc@...> wrote:
Hi Pawan,

It seems I miss the practical experience, but it seems you have to set graph.allow-upgrade=true (although it is not clear whether this applies to all version changes or only to "storage version" changes)

https://docs.janusgraph.org/v0.6/configs/configuration-reference/

If this is the solution, the error message is certainly not helpful and the upgrade docs incomplete.

Best wishes,   Marc



--
Thanks & Regard

PAWAN SHRIWAS


Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

hadoopmarc@...
 

Hi Pawan,

It seems I miss the practical experience, but it seems you have to set graph.allow-upgrade=true (although it is not clear whether this applies to all version changes or only to "storage version" changes)

https://docs.janusgraph.org/v0.6/configs/configuration-reference/

If this is the solution, the error message is certainly not helpful and the upgrade docs incomplete.

Best wishes,   Marc


Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

Pawan Shriwas
 

Hi Marc,

Thanks  for your reply,

But nothing was changed on the cassandra cluster part, Keyspace is already there with graph data and it was working before the janusgraph version upgrade.  I don't know but it seems new version(0.6.0) creates a keyspace again while that keyspace is already there and available in DB and has access to that user.

Thanks,
Pawan

On Fri, Oct 8, 2021 at 12:07 PM <hadoopmarc@...> wrote:
Hi Pawan,

Caused by: com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: Unauthorized. User graph_user has no CREATE permission on <all keyspaces> or any of its parents
CREATE KEYSPACE IF NOT EXISTS graphDataTable WITH replication={'replication_factor':1,'class':'SimpleStrategy'}
This part of your stacktrace suggests that something changed on your Cassandra cluster or with the graph_user. So, check the CREATE permission, with your admin or try to create a keyspace with nodetool.

Also note that a few changes in cql properties had to be made (but probably not related):
https://docs.janusgraph.org/v0.6/changelog/

Best wishes,     Marc



--
Thanks & Regard

PAWAN SHRIWAS


Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

hadoopmarc@...
 

Hi Pawan,

Caused by: com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: Unauthorized. User graph_user has no CREATE permission on <all keyspaces> or any of its parents
CREATE KEYSPACE IF NOT EXISTS graphDataTable WITH replication={'replication_factor':1,'class':'SimpleStrategy'}
This part of your stacktrace suggests that something changed on your Cassandra cluster or with the graph_user. So, check the CREATE permission, with your admin or try to create a keyspace with nodetool.

Also note that a few changes in cql properties had to be made (but probably not related):
https://docs.janusgraph.org/v0.6/changelog/

Best wishes,     Marc

441 - 460 of 6656