Re: Cassandra 4
hadoopmarc@...
Hi,
There is an issue tracking this, but no PR's yet, see: https://github.com/JanusGraph/janusgraph/issues/2325 Best wishes, Marc |
|
Cassandra 4
Kusnierz.Krzysztof@...
Hi, has anyone tried JG with Cassandra 4 ? Does it work ?
|
|
Re: How to Merge Two Vertices in JanusGraph into single vertex
hadoopmarc@...
Hi Krishna,
Nope. However, you are not the first to ask, see: https://stackoverflow.com/questions/46363737/tinkerpop-gremlin-merge-vertices-and-edges/46435070#46435070 Best wishes, Marc |
|
How to Merge Two Vertices in JanusGraph into single vertex
krishna.sailesh2@...
Hi Folks |
|
Re: Usage of CustomID on Vertexes
hadoopmarc@...
Hi Hazal,
Your comment is correct: the graph.set-vertex-id feature is not documented further than this, so using it is not advised. You are also right that lookups in the index require additional processing. However, depending on the ordering of inserts and their distribution across JanusGraph instances, many lookups can be avoided if vertices are still in the JanusGraph cache. Also, using storage backend ids assigned by JanusGraph will be more efficient for vertex reads later on because of the id partitioning applied. So I support your presumption that using an indexed property is to be preferred. Best wishes, Marc |
|
Re: JanusGraph server clustering with NodeJS
hadoopmarc@...
I read some conceptual confusion, so let me try:
|
|
Re: JanusGraph server clustering with NodeJS
sppriyaindu@...
We are also facing similar issue .. Could you please direct us how do we handle Janus cluster using node js
|
|
Usage of CustomID on Vertexes
hazalkecoglu@...
Hi everyone,
I have a confusion about the topic i mentioned below, could anyone give any suggestion about it? or does the problem familiar with you? what was your solution? I need to load data from a relational database to JanusGraph. I want to use CustomID while I am loading vertexes.The main reason behind to use CustomID is that I want perform faster to be able to load related Vertexes on that ID while creating edges between vertexes. So, document says that if I activated graph.set-vertex-id attribute some of other attributes will be disabled. What are those attributes? Isn't it an approved solution? Or instead of using ID to reach a vertex is it a good solution to reach it by an indexed property? Which will perform better? Thanks a lot, Hazal |
|
Re: Edge traveresal .hasId() not returning expected results
AC
Thanks Boxuan! I look forward to that release. In the meantime, I was able to work around this issue now that I know it is not producing the expected results, in a way that should be compatible with this change once it is rolled out. I really appreciate your thorough answer and explanation. On Sat, Nov 6, 2021 at 5:12 PM Boxuan Li <liboxuan@...> wrote: Fixed by https://github.com/JanusGraph/janusgraph/pull/2849 and will be included in the next release (0.6.1). |
|
JanusGraph server clustering with NodeJS
51kumarakhil@...
Hello!
I'm new to JanusGraph and implemented "single janus server- single nodejs client" setup successfully. Able to create multiple dynamic graphs within the same janusServer instance (if anyone need helps here, I'm happy to help) with Google BigTable as storage backend. The thing I'm stuck with is this clustering of Janus graph. The idea is to scale up the current architecture with multiple JanusGraph servers to speed up the execution. I've three google VM instances where I've have configured the same JanusGraph server setup. Now, how can I distribute the data to these three JanusGraph Servers. I'm using "gremlin" npm module, but it doesn't have the option to connect with three servers at the same time. Its been a week I'm looking for solutions, but not able to find any. Is there anything I'm missing? If there's any better approach? Or any solution to my problem? Any help would be Great! Thank You! Happy Coding! |
|
jvm.options broken
m.leinweber@...
Hello Janusgraphers,
seems that janusgraph-server.sh line 116 is broken. Only the last entry of the jvm options file is used. br, Matthias |
|
Re: Edge traveresal .hasId() not returning expected results
Boxuan Li
Fixed by https://github.com/JanusGraph/janusgraph/pull/2849 and will be included in the next release (0.6.1).
|
|
Re: Edge traveresal .hasId() not returning expected results
Boxuan Li
Hi Adam,
toggle quoted message
Show quoted text
Thanks for reporting! This is a bug and I just created an issue for it: https://github.com/JanusGraph/janusgraph/issues/2848 If you wanted to know more about this bug, feel free to post a follow-up on that issue. JanusGraphStep is an implementation of TinkerPop GraphStep which contains some optimization. JanusGraph converts GraphStep to JanusGraphStep as long as it finds there is space to optimize. In the “g.E().hasId(xx)” example, if we strictly follow the execution order, JanusGraph would load all edges and then do an in-memory filtering. That’s where `JanusGraphStep` comes into play - it “folds” all `has` conditions (including `hasId` step) so that index can be potentially utilized. Unfortunately, due to a bug that you found, g.E().hasId(xx) does not work as expected. If you insert a dummy `map` step in-between like this: g.E().map{t -> t.get()}.hasId(“4r6-39s-69zp-3c8”) Then you will get the result you want. However, this is highly discouraged, as it will aggressively prevent JanusGraph from doing any optimization, and requires a full scan on all data entries. Best regards, Boxuan
|
|
Edge traveresal .hasId() not returning expected results
AC
Hey folks, I'm seeing strange results trying to use the hasId step on an edge traversal: @ g.E("4r6-39s-69zp-3c8").toList res49: List[Edge] = List(e[4r6-39s-69zp-3c8][4240-RetrocomputerPurchaser->4328]) @ g.E().hasId("4r6-39s-69zp-3c8").toList res50: List[Edge] = List() @ g.E("4r6-39s-69zp-3c8").traversal.profile().toList res51: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= GraphStep(edge,[4r6-39s-69zp-3c8]) 1 1 0.237 100.00 >TOTAL - - 0.237 -] @ g.E().hasId("4r6-39s-69zp-3c8").traversal.profile().toList res52: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= JanusGraphStep(edge,[4r6-39s-69zp-3c8]) 0.039 100.00 >TOTAL - - 0.039 -] 1) Why would these two traversals produce different results? 2) What's the difference between the GraphStep and JanusGraphStep representations of these traversals? They look the same otherwise via explain/profile. 3) Is there any working encoding of this query starting with an edge traversal g.E() that can produce the same result? Thanks, - Adam |
|
Re: Options for Bulk Read/Bulk Export
> Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light
Not sure I fully following but will try to add some more clearance. - vertex ids are not cleared from `vertex`. So, when you return vertices you simply hold them in your heap but all edges / properties are also managed by internal caches. By default if you return vertices you don't return it's properties / edges. To return properties for vertices you might use `valueMap`, `properties`, `values` gremlin steps. In the previous message I wasn't talking about using Gremlin but said about `multiQuery` which is a JanusGraph feature. `multiQuery` may store data in tx-cache if you preload your properties. To use multiQuery you must provide vertices for which you want to preload properties (think about it as simple vertex ids rather than a collection of all vertex data). After you preload properties they are stored in tx-level cache and also may be stored in db-level cache if you enabled that. After that you can access vertex properties without additional calls to internal database but instead get those properties from tx-level cache. There is a property `cache.tx-cache-size` which says `Maximum size of the transaction-level cache of recently-used vertices.`. By default it's 20000 but you can configure this individually per transaction when you are creating your transaction. As you said, you don't have possibility to store 10M vertices in your cache then you need to split your work on different chunks. Basically something like: janusGraph.multiQuery().addAllVertices(youFirstMillionVertices).properties().forEach( // process your vertex properties ); janusGraph.multiQuery().addAllVertices(youSecondMillionVertices).properties().forEach( // process your vertex properties. // As your youFirstMillionVertices are processed it means they will be evicted from tx-level cache because youSecondMillionVertices are now recently-used vertices. ); janusGraph.multiQuery().addAllVertices(youThirdMillionVertices).properties().forEach( // process your vertex properties // As your youSecondMillionVertices are processed it means they will be evicted from tx-level cache because youThirdMillionVertices are now recently-used vertices. ); // ... You may also simply close and reopen transactions when you processed some chunk of your data. Under the hood multiQuery will either you batch db feature or https://docs.janusgraph.org/configs/configuration-reference/#storageparallel-backend-executor-service In case you are trying to find a good executor service I would suggest to look at scalable executor service like https://github.com/elastic/elasticsearch/blob/dfac67aff0ca126901d72ed7fe862a1e7adb19b0/server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java#L74-L81 or similar executor services. I wouldn't recommend using executor services without upper bound limit like Cached thread pool because they are quite dangerous. Hope it helps somehow. Best regards, Oleksandr |
|
Re: Janusgraph Schema dump
hadoopmarc@...
Hi Pawan,
1. See https://docs.janusgraph.org/schema/#displaying-schema-information 2. See the manuals of your storage and indexing backends (and older questions on this list) 3. Please elaborate; somehow your question does not make sense to me Best wishes, Marc |
|
Re: Janusgraph Schema dump
Pawan Shriwas
adding one more points 3. How can we get the mapped properties in a label or all labels On Thu, Oct 21, 2021 at 7:46 PM Pawan Shriwas <shriwas.pawan@...> wrote:
--
Thanks & Regard PAWAN SHRIWAS |
|
Janusgraph Schema dump
Pawan Shriwas
Hi All, 1. Database Schema dump (Want to use same schema dump on another env using same export) 2. Database data dump for backup and restore case. Thanks, Pawan |
|
Re: Options for Bulk Read/Bulk Export
subbu165@...
So currently we have JanusGraph with the storage back-end as FDB and use ElasticSearch for indexing.
First we get the vertexIDs indexes from Elasticsearch back-end and then below is what we do
JanusGraph graph = JanusGraphFactory.open(janusConfig);
Vertex vertex = graph.vertices(vertexId).next();
All the above including getting the vertexid indexes from Elasticsearch happens within the spark context using sparkRDD for partition and parallelisation. If we remove spark out of the equation, what else best way I can do bulkExport? Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light |
|
Re: Queries with negated text predicates fail with lucene
hadoopmarc@...
Hi Toom,
Yes, you are right, this behavior is not 100% consistent. Also, as noted, the documentation regarding text predicates on properties without index is incomplete. Use cases are sparse, though, because on a graph of practical size, working without index is not an option. Finally, improving this in a backward compatible way might prove impossible. Best wishes, Marc |
|