Date   

Re: JanusGraph server clustering with NodeJS

sppriyaindu@...
 

We are also facing similar issue .. Could you please direct us how do we handle Janus cluster using node js 


Usage of CustomID on Vertexes

hazalkecoglu@...
 

Hi everyone,

I have a confusion about the topic i mentioned below,  could anyone give any suggestion about it? or does the problem familiar with you?  what was your solution?

I need to load data from a relational database to JanusGraph. I want to use CustomID while I am loading vertexes.The main reason behind to use CustomID is that I want perform faster to be able to load related Vertexes on that ID while creating edges between vertexes.

So, document says that if I activated graph.set-vertex-id attribute some of other attributes will be disabled. What are those attributes? Isn't it an approved solution?
Or instead of using ID to reach a vertex is it a good solution to reach it by an indexed property? Which will perform better? 

Thanks a lot,
Hazal 


Re: Edge traveresal .hasId() not returning expected results

AC
 

Thanks Boxuan! I look forward to that release. In the meantime, I was able to work around this issue now that I know it is not producing the expected results, in a way that should be compatible with this change once it is rolled out. I really appreciate your thorough answer and explanation.


On Sat, Nov 6, 2021 at 5:12 PM Boxuan Li <liboxuan@...> wrote:
Fixed by https://github.com/JanusGraph/janusgraph/pull/2849 and will be included in the next release (0.6.1).


JanusGraph server clustering with NodeJS

51kumarakhil@...
 

Hello!
I'm new to JanusGraph and implemented "single janus server- single nodejs client" setup successfully. Able to create multiple dynamic graphs within the same janusServer instance (if anyone need helps here, I'm happy to help) with Google BigTable as storage backend.

The thing I'm stuck with is this clustering of Janus graph. The idea is to scale up the current architecture with multiple JanusGraph servers to speed up the execution.
I've three google VM instances where I've have configured the same JanusGraph server setup. Now, how can I distribute the data to these three JanusGraph Servers.
I'm using "gremlin" npm module, but it doesn't have the option to connect with three servers at the same time. 

Its been a week I'm looking for solutions, but not able to find any. 
Is there anything I'm missing? If there's any better approach? Or any solution to my problem?

Any help would be Great! Thank You!
Happy Coding!


jvm.options broken

m.leinweber@...
 

Hello Janusgraphers,

seems that janusgraph-server.sh line 116 is broken. Only the last entry of the jvm options file is used.

br,
Matthias


Re: Edge traveresal .hasId() not returning expected results

Boxuan Li
 

Fixed by https://github.com/JanusGraph/janusgraph/pull/2849 and will be included in the next release (0.6.1).


Re: Edge traveresal .hasId() not returning expected results

Boxuan Li
 

Hi Adam,

Thanks for reporting! This is a bug and I just created an issue for it: https://github.com/JanusGraph/janusgraph/issues/2848 If you wanted to know more about this bug, feel free to post a follow-up on that issue.

JanusGraphStep is an implementation of TinkerPop GraphStep which contains some optimization. JanusGraph converts GraphStep to JanusGraphStep as long as it finds there is space to optimize. In the “g.E().hasId(xx)” example, if we strictly follow the execution order, JanusGraph would load all edges and then do an in-memory filtering. That’s where `JanusGraphStep` comes into play - it “folds” all `has` conditions (including `hasId` step) so that  index can be potentially utilized. Unfortunately, due to a bug that you found, g.E().hasId(xx) does not work as expected.

If you insert a dummy `map` step in-between like this:

g.E().map{t -> t.get()}.hasId(“4r6-39s-69zp-3c8”)

Then you will get the result you want. However, this is highly discouraged, as it will aggressively prevent JanusGraph from doing any optimization, and requires a full scan on all data entries.

Best regards,
Boxuan

On Nov 5, 2021, at 2:01 PM, Adam Crane via lists.lfaidata.foundation <acrane=twitter.com@...> wrote:

Hey folks, I'm seeing strange results trying to use the hasId step on an edge traversal:

@ g.E("4r6-39s-69zp-3c8").toList 
res49: List[Edge] = List(e[4r6-39s-69zp-3c8][4240-RetrocomputerPurchaser->4328])

@ g.E().hasId("4r6-39s-69zp-3c8").toList 
res50: List[Edge] = List()

@ g.E("4r6-39s-69zp-3c8").traversal.profile().toList 
res51: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(edge,[4r6-39s-69zp-3c8])                                     1           1           0.237   100.00
                                            >TOTAL                     -           -           0.237        -]

@ g.E().hasId("4r6-39s-69zp-3c8").traversal.profile().toList 
res52: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep(edge,[4r6-39s-69zp-3c8])                                                        0.039   100.00
                                            >TOTAL                     -           -           0.039        -]

1) Why would these two traversals produce different results?
2) What's the difference between the GraphStep and JanusGraphStep representations of these traversals? They look the same otherwise via explain/profile.
3) Is there any working encoding of this query starting with an edge traversal g.E() that can produce the same result?

Thanks,
- Adam


Edge traveresal .hasId() not returning expected results

AC
 

Hey folks, I'm seeing strange results trying to use the hasId step on an edge traversal:

@ g.E("4r6-39s-69zp-3c8").toList 

res49: List[Edge] = List(e[4r6-39s-69zp-3c8][4240-RetrocomputerPurchaser->4328])


@ g.E().hasId("4r6-39s-69zp-3c8").toList 

res50: List[Edge] = List()


@ g.E("4r6-39s-69zp-3c8").traversal.profile().toList 

res51: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

GraphStep(edge,[4r6-39s-69zp-3c8])                                     1           1           0.237   100.00

                                            >TOTAL                     -           -           0.237        -]


@ g.E().hasId("4r6-39s-69zp-3c8").traversal.profile().toList 

res52: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

JanusGraphStep(edge,[4r6-39s-69zp-3c8])                                                        0.039   100.00

                                            >TOTAL                     -           -           0.039        -]


1) Why would these two traversals produce different results?
2) What's the difference between the GraphStep and JanusGraphStep representations of these traversals? They look the same otherwise via explain/profile.
3) Is there any working encoding of this query starting with an edge traversal g.E() that can produce the same result?

Thanks,
- Adam


Re: Options for Bulk Read/Bulk Export

Oleksandr Porunov
 

> Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light

Not sure I fully following but will try to add some more clearance.
- vertex ids are not cleared from `vertex`. So, when you return vertices you simply hold them in your heap but all edges / properties are also managed by internal caches. By default if you return vertices you don't return it's properties / edges.
To return properties for vertices you might use `valueMap`, `properties`, `values` gremlin steps.
In the previous message I wasn't talking about using Gremlin but said about `multiQuery` which is a JanusGraph feature. `multiQuery` may store data in tx-cache if you preload your properties.
To use multiQuery you must provide vertices for which you want to preload properties (think about it as simple vertex ids rather than a collection of all vertex data). After you preload properties they are stored in tx-level cache and also may be stored in db-level cache if you enabled that. After that you can access vertex properties without additional calls to internal database but instead get those properties from tx-level cache.
There is a property `cache.tx-cache-size` which says `Maximum size of the transaction-level cache of recently-used vertices.`. By default it's 20000 but you can configure this individually per transaction when you are creating your transaction.
As you said, you don't have possibility to store 10M vertices in your cache then you need to split your work on different chunks.
Basically something like:
janusGraph.multiQuery().addAllVertices(youFirstMillionVertices).properties().forEach(
// process your vertex properties
);
janusGraph.multiQuery().addAllVertices(youSecondMillionVertices).properties().forEach(
// process your vertex properties.
// As your youFirstMillionVertices are processed it means they will be evicted from tx-level cache because youSecondMillionVertices are now recently-used vertices.
);
janusGraph.multiQuery().addAllVertices(youThirdMillionVertices).properties().forEach(
// process your vertex properties
// As your youSecondMillionVertices are processed it means they will be evicted from tx-level cache because youThirdMillionVertices are now recently-used vertices.
);
// ...

You may also simply close and reopen transactions when you processed some chunk of your data.

Under the hood multiQuery will either you batch db feature or https://docs.janusgraph.org/configs/configuration-reference/#storageparallel-backend-executor-service

In case you are trying to find a good executor service I would suggest to look at scalable executor service like https://github.com/elastic/elasticsearch/blob/dfac67aff0ca126901d72ed7fe862a1e7adb19b0/server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java#L74-L81
or similar executor services. I wouldn't recommend using executor services without upper bound limit like Cached thread pool because they are quite dangerous.

Hope it helps somehow.

Best regards,
Oleksandr


Re: Janusgraph Schema dump

hadoopmarc@...
 

Hi Pawan,

1. See https://docs.janusgraph.org/schema/#displaying-schema-information
2. See the manuals of your storage and indexing backends (and older questions on this list)
3. Please elaborate; somehow your question does not make sense to me

Best wishes,

Marc


Re: Janusgraph Schema dump

Pawan Shriwas
 

adding one more points  

 3. How can we get the mapped properties in a label or all labels


On Thu, Oct 21, 2021 at 7:46 PM Pawan Shriwas <shriwas.pawan@...> wrote:
Hi All,

Is any one let me know how to do below two items in janusgraph.

1. Database Schema dump (Want to use same schema dump on another env using same export) 
2. Database data dump for backup and restore case.

Thanks,
Pawan


--
Thanks & Regard

PAWAN SHRIWAS


Janusgraph Schema dump

Pawan Shriwas
 

Hi All,

Is any one let me know how to do below two items in janusgraph.

1. Database Schema dump (Want to use same schema dump on another env using same export) 
2. Database data dump for backup and restore case.

Thanks,
Pawan


Re: Options for Bulk Read/Bulk Export

subbu165@...
 

So currently we have JanusGraph with the storage back-end as FDB and use ElasticSearch for indexing. 
 
First we get the vertexIDs indexes from Elasticsearch back-end and then below is what we do 
JanusGraph graph = JanusGraphFactory.open(janusConfig);
Vertex vertex = graph.vertices(vertexId).next(); 
 
All the above including getting the vertexid indexes from Elasticsearch happens within the spark context using sparkRDD for partition and parallelisation. If we remove spark out of the equation, what else best way I can do bulkExport?
Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light


Re: Queries with negated text predicates fail with lucene

hadoopmarc@...
 

Hi Toom,

Yes, you are right, this behavior is not 100% consistent. Also, as noted, the documentation regarding text predicates on properties without index is incomplete. Use cases are sparse, though, because on a graph of practical size, working without index is not an option. Finally, improving this in a backward compatible way might prove impossible.

Best wishes,

Marc


Re: potential memory leak

Oleksandr Porunov
 

Hi ViVek,

I would suggest to upgrade to 0.6.0 version of JanusGraph.
It's hard to understand your case from that scope of information you provided. Generally, I would suggest checking if you always close transactions. Even read queries are opening new transactions when your read from JanusGraph. If you are using Cached thread pool for your connections (default option for Tomcat for example), you may potentially open transactions and after a while the relative threads might be evicted from pool but the underlying transaction never closes automatically (only manually). Thus, the leak could potentially happen in this situation. That said, I would simply suggest to analyze your heap dumps to see what exactly happening in your situation. In this case you could potentially find the problem which causes the leak.

Best regards,
Oleksandr


Re: Options for Bulk Read/Bulk Export

Oleksandr Porunov
 

To add to Mark's suggestions there is also a multiQuery option in janusgraph-core. Notice, it's internal API and not Gremlin. Thus, it might be unavailable to you if you can access JanusGraph internal API for any reason.
If you work with multiQuery like `janusGraph.multiQuery().addAllVertices(yourVertices).properties()` then make sure you transaction cache is at least the size of `yourVertices.size()`. Otherwise, additional calls might be executed to your backend which could be not as efficient.

Best regards,
Oleksandr


Re: Options for Bulk Read/Bulk Export

hadoopmarc@...
 

Hi,

There are three solution directions:
  1. if you have keys to your vertices available, either vertex ids or unique values of some vertex property, you can start as many gremlin clients as your backends can handle and distribute the keys over the clients. This is the easy case, but often not applicable.
  2. if there are no keys available, gremlin can only help you with a full table scan g.V(). If you have a client machine with many cores, the withComputer() step, either with or without spark-local, will help you parallelize the scan.
  3. you can copy the vertex files from the storage backend and decode them offline. Decoding procedures are implicit in the janusgraph source code, but I am not aware of any library that does this for you explicitly.
You decide, but I would suggest option 2 with spark-local as the option that works out of the box.

Best wishes,    Marc


Options for Bulk Read/Bulk Export

subbu165@...
 

Hi There, we have Janus-Graph with back-end store Foundation DB and index back-end as Elastic Search. Please let me know what is the best way to export/read Millions of Records from JaunusGraph by keeping performance in mind. We don't have the option of using Spark in our environment.  I have seen 100s of articles on Bulk Loading but not bulk export/Read. Any suggestion would be of a great help here.


Re: Queries with negated text predicates fail with lucene

toom@...
 

Hi Marc,

IMHO, an index should not prevent a query to work. Moreover the result of a query should not depends of backends (storage and index). If an index backend cannot process a predicate, the predicate should be be executed as if index wasn't present.

To clarify, below is a sample of code. The same query works without index (line 13) and fails with index (line 31).
     1  // create schema
     2  mgmt = graph.openManagement()
     3  mgmt.makePropertyKey('string').dataType(String.class).cardinality(Cardinality.SINGLE).make()
     4  mgmt.makeVertexLabel('data').make()
     5  mgmt.commit()
     6
     7  // add data
     8  g.addV('data').property('string', 'foo')
     9  ==>v[4120]
    10  g.addV('data').property('string', 'bar')
    11  ==>v[4312]
    12
    13  g.V().hasLabel('data').has('string', textNotContains('bar'))
    14  WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(~label = data AND string textNotContains bar)]. For better performance, use indexes
    15  ==>v[4120]
    16
    17  // add indexe with lucene backend
    18  mgmt = graph.openManagement()
    19  string = mgmt.getPropertyKey("string")
    20  mgmt.buildIndex('myindex', Vertex.class).addKey(string, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search")
    21  mgmt.commit()
    22
    23  // Wait the indexes
    24  ManagementSystem.awaitGraphIndexStatus(graph, 'myindex').call()
    25
    26  // Reindex data
    27  mgmt = graph.openManagement()
    28  mgmt.updateIndex(mgmt.getGraphIndex("myindex"), SchemaAction.REINDEX).get()
    29  mgmt.commit()
    30
    31  g.V().hasLabel('data').has('string', textNotContains('bar'))
    32  Could not call index

Regards,

Toom.


Re: Potential transaction issue (JG 0.6.0)

Charles <dostanian@...>
 

I have also encountered this problem, but I have found a reliable way to reproduce.

The following query works perfectly in the gremlin console (both server and client from Janusgraph 0.6.0 distribution)

gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").skip(0).limit(10).elementMap("workerId")
gremlin> g.V().has("COMPANY", "companyId", 44507).out("EMPLOYS").has("status", "APPROVED").order().by("lastName").by("firstName").skip(0).limit(10).elementMap("workerId")

In Java this query fails with the same exception as in your trace, if I pass offset = 0 (zero) it seems to work sometimes fiddling with the offset (3,5,10 and so on)
 
return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);

To make the query succeed I have to either remove the skip() and limit() or order the results before skipping and limiting i.e.

return traversal.V().has(VertexType.COMPANY.name(), CompanyWrapper.PROP_COMPANY_ID, companyId)
.out(EdgeType.EMPLOYS.name())
.has(WorkerWrapper.PROP_STATUS, WORKER_STATUS_APPROVED)
.order()
.by(WorkerWrapper.PROP_LAST_NAME).by(WorkerWrapper.PROP_FIRST_NAME)
.skip(offset)
.limit(limit)
.elementMap(properties)
.toStream()
.map(WorkerWrapper::of);
 
Quite a number of my queries depends on this style of code, start with a known node, traverse edges, has clause and then skip and limit.

9674 [gremlin-server-exec-2] WARN  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Exception processing a Traversal on iteration for request [6be9a75f-fa5d-4ef1-a61e-7d133bca33c8].
2021-10-15T21:32:09.932504000Z java.lang.NullPointerException
2021-10-15T21:32:09.932569000Z at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
2021-10-15T21:32:09.932611000Z at org.janusgraph.graphdb.query.vertex.VertexLongList.get(VertexLongList.java:72)
2021-10-15T21:32:09.932652000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:144)
2021-10-15T21:32:09.932697000Z at org.janusgraph.graphdb.query.vertex.VertexLongList$1.next(VertexLongList.java:131)
2021-10-15T21:32:09.932734000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:45)
2021-10-15T21:32:09.932774000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932812000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932852000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932890000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.932927000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.932961000Z at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
2021-10-15T21:32:09.932995000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933033000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
2021-10-15T21:32:09.933068000Z at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
2021-10-15T21:32:09.933109000Z at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
2021-10-15T21:32:09.933143000Z at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
2021-10-15T21:32:09.933188000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
2021-10-15T21:32:09.933228000Z at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
2021-10-15T21:32:09.933265000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
2021-10-15T21:32:09.933299000Z at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor
 
 

421 - 440 of 6651