Date   

Re: Cassandra 4

hadoopmarc@...
 

Hi,

There is an issue tracking this, but no PR's yet, see: https://github.com/JanusGraph/janusgraph/issues/2325

Best wishes,     Marc


Cassandra 4

Kusnierz.Krzysztof@...
 

Hi, has anyone tried JG with Cassandra 4 ? Does it work ?


Re: How to Merge Two Vertices in JanusGraph into single vertex

hadoopmarc@...
 

Hi Krishna,

Nope. However, you are not the first to ask, see:
https://stackoverflow.com/questions/46363737/tinkerpop-gremlin-merge-vertices-and-edges/46435070#46435070

Best wishes,   Marc


How to Merge Two Vertices in JanusGraph into single vertex

krishna.sailesh2@...
 

Hi Folks

can you please help me to know how to merge 2 vertices in Janus Graph into a single vertex?

 I am using Cassandra DB as a backend for JanusGraph and unique vertex constraint with id+name vertex properties. I have two vertices B (id:234 and name: orange)  and C (id:345 and name: orange). I want to merge Vertex C to Vertex B so that all the edges connected to Vertex C should connect to Vertex B. and now Vertex B has both edges of vertex B and C. Apart from Adding all the Edges of Vertex C to Vertex B and deleting vertex C is there any way to do it?



Thanks
Krishna Sailesh



Re: Usage of CustomID on Vertexes

hadoopmarc@...
 

Hi Hazal,

Your comment is correct: the graph.set-vertex-id feature is not documented further than this, so using it is not advised.

You are also right that lookups in the index require additional processing. However, depending on the ordering of inserts and their distribution across JanusGraph instances,  many lookups can be avoided if vertices are still in the JanusGraph cache. Also, using storage backend ids assigned by JanusGraph will be more efficient for vertex reads later on because of the id partitioning applied.

So I support your presumption that using an indexed property is to be preferred.

Best wishes,    Marc


Re: JanusGraph server clustering with NodeJS

hadoopmarc@...
 

I read some conceptual confusion, so let me try:
  • a single query is handled by a single janusgraph instance (In this case JanusGraph Server)
  • you can handle many queries from many nodejs clients in parallel by configuring multiple JanusGraph Servers and make these accessible via a load balancer
Best wishes,   Marc


Re: JanusGraph server clustering with NodeJS

sppriyaindu@...
 

We are also facing similar issue .. Could you please direct us how do we handle Janus cluster using node js 


Usage of CustomID on Vertexes

hazalkecoglu@...
 

Hi everyone,

I have a confusion about the topic i mentioned below,  could anyone give any suggestion about it? or does the problem familiar with you?  what was your solution?

I need to load data from a relational database to JanusGraph. I want to use CustomID while I am loading vertexes.The main reason behind to use CustomID is that I want perform faster to be able to load related Vertexes on that ID while creating edges between vertexes.

So, document says that if I activated graph.set-vertex-id attribute some of other attributes will be disabled. What are those attributes? Isn't it an approved solution?
Or instead of using ID to reach a vertex is it a good solution to reach it by an indexed property? Which will perform better? 

Thanks a lot,
Hazal 


Re: Edge traveresal .hasId() not returning expected results

AC
 

Thanks Boxuan! I look forward to that release. In the meantime, I was able to work around this issue now that I know it is not producing the expected results, in a way that should be compatible with this change once it is rolled out. I really appreciate your thorough answer and explanation.


On Sat, Nov 6, 2021 at 5:12 PM Boxuan Li <liboxuan@...> wrote:
Fixed by https://github.com/JanusGraph/janusgraph/pull/2849 and will be included in the next release (0.6.1).


JanusGraph server clustering with NodeJS

51kumarakhil@...
 

Hello!
I'm new to JanusGraph and implemented "single janus server- single nodejs client" setup successfully. Able to create multiple dynamic graphs within the same janusServer instance (if anyone need helps here, I'm happy to help) with Google BigTable as storage backend.

The thing I'm stuck with is this clustering of Janus graph. The idea is to scale up the current architecture with multiple JanusGraph servers to speed up the execution.
I've three google VM instances where I've have configured the same JanusGraph server setup. Now, how can I distribute the data to these three JanusGraph Servers.
I'm using "gremlin" npm module, but it doesn't have the option to connect with three servers at the same time. 

Its been a week I'm looking for solutions, but not able to find any. 
Is there anything I'm missing? If there's any better approach? Or any solution to my problem?

Any help would be Great! Thank You!
Happy Coding!


jvm.options broken

m.leinweber@...
 

Hello Janusgraphers,

seems that janusgraph-server.sh line 116 is broken. Only the last entry of the jvm options file is used.

br,
Matthias


Re: Edge traveresal .hasId() not returning expected results

Boxuan Li
 

Fixed by https://github.com/JanusGraph/janusgraph/pull/2849 and will be included in the next release (0.6.1).


Re: Edge traveresal .hasId() not returning expected results

Boxuan Li
 

Hi Adam,

Thanks for reporting! This is a bug and I just created an issue for it: https://github.com/JanusGraph/janusgraph/issues/2848 If you wanted to know more about this bug, feel free to post a follow-up on that issue.

JanusGraphStep is an implementation of TinkerPop GraphStep which contains some optimization. JanusGraph converts GraphStep to JanusGraphStep as long as it finds there is space to optimize. In the “g.E().hasId(xx)” example, if we strictly follow the execution order, JanusGraph would load all edges and then do an in-memory filtering. That’s where `JanusGraphStep` comes into play - it “folds” all `has` conditions (including `hasId` step) so that  index can be potentially utilized. Unfortunately, due to a bug that you found, g.E().hasId(xx) does not work as expected.

If you insert a dummy `map` step in-between like this:

g.E().map{t -> t.get()}.hasId(“4r6-39s-69zp-3c8”)

Then you will get the result you want. However, this is highly discouraged, as it will aggressively prevent JanusGraph from doing any optimization, and requires a full scan on all data entries.

Best regards,
Boxuan

On Nov 5, 2021, at 2:01 PM, Adam Crane via lists.lfaidata.foundation <acrane=twitter.com@...> wrote:

Hey folks, I'm seeing strange results trying to use the hasId step on an edge traversal:

@ g.E("4r6-39s-69zp-3c8").toList 
res49: List[Edge] = List(e[4r6-39s-69zp-3c8][4240-RetrocomputerPurchaser->4328])

@ g.E().hasId("4r6-39s-69zp-3c8").toList 
res50: List[Edge] = List()

@ g.E("4r6-39s-69zp-3c8").traversal.profile().toList 
res51: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(edge,[4r6-39s-69zp-3c8])                                     1           1           0.237   100.00
                                            >TOTAL                     -           -           0.237        -]

@ g.E().hasId("4r6-39s-69zp-3c8").traversal.profile().toList 
res52: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep(edge,[4r6-39s-69zp-3c8])                                                        0.039   100.00
                                            >TOTAL                     -           -           0.039        -]

1) Why would these two traversals produce different results?
2) What's the difference between the GraphStep and JanusGraphStep representations of these traversals? They look the same otherwise via explain/profile.
3) Is there any working encoding of this query starting with an edge traversal g.E() that can produce the same result?

Thanks,
- Adam


Edge traveresal .hasId() not returning expected results

AC
 

Hey folks, I'm seeing strange results trying to use the hasId step on an edge traversal:

@ g.E("4r6-39s-69zp-3c8").toList 

res49: List[Edge] = List(e[4r6-39s-69zp-3c8][4240-RetrocomputerPurchaser->4328])


@ g.E().hasId("4r6-39s-69zp-3c8").toList 

res50: List[Edge] = List()


@ g.E("4r6-39s-69zp-3c8").traversal.profile().toList 

res51: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

GraphStep(edge,[4r6-39s-69zp-3c8])                                     1           1           0.237   100.00

                                            >TOTAL                     -           -           0.237        -]


@ g.E().hasId("4r6-39s-69zp-3c8").traversal.profile().toList 

res52: java.util.List[org.apache.tinkerpop.gremlin.process.traversal.util.TraversalMetrics] = [Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

JanusGraphStep(edge,[4r6-39s-69zp-3c8])                                                        0.039   100.00

                                            >TOTAL                     -           -           0.039        -]


1) Why would these two traversals produce different results?
2) What's the difference between the GraphStep and JanusGraphStep representations of these traversals? They look the same otherwise via explain/profile.
3) Is there any working encoding of this query starting with an edge traversal g.E() that can produce the same result?

Thanks,
- Adam


Re: Options for Bulk Read/Bulk Export

Oleksandr Porunov
 

> Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light

Not sure I fully following but will try to add some more clearance.
- vertex ids are not cleared from `vertex`. So, when you return vertices you simply hold them in your heap but all edges / properties are also managed by internal caches. By default if you return vertices you don't return it's properties / edges.
To return properties for vertices you might use `valueMap`, `properties`, `values` gremlin steps.
In the previous message I wasn't talking about using Gremlin but said about `multiQuery` which is a JanusGraph feature. `multiQuery` may store data in tx-cache if you preload your properties.
To use multiQuery you must provide vertices for which you want to preload properties (think about it as simple vertex ids rather than a collection of all vertex data). After you preload properties they are stored in tx-level cache and also may be stored in db-level cache if you enabled that. After that you can access vertex properties without additional calls to internal database but instead get those properties from tx-level cache.
There is a property `cache.tx-cache-size` which says `Maximum size of the transaction-level cache of recently-used vertices.`. By default it's 20000 but you can configure this individually per transaction when you are creating your transaction.
As you said, you don't have possibility to store 10M vertices in your cache then you need to split your work on different chunks.
Basically something like:
janusGraph.multiQuery().addAllVertices(youFirstMillionVertices).properties().forEach(
// process your vertex properties
);
janusGraph.multiQuery().addAllVertices(youSecondMillionVertices).properties().forEach(
// process your vertex properties.
// As your youFirstMillionVertices are processed it means they will be evicted from tx-level cache because youSecondMillionVertices are now recently-used vertices.
);
janusGraph.multiQuery().addAllVertices(youThirdMillionVertices).properties().forEach(
// process your vertex properties
// As your youSecondMillionVertices are processed it means they will be evicted from tx-level cache because youThirdMillionVertices are now recently-used vertices.
);
// ...

You may also simply close and reopen transactions when you processed some chunk of your data.

Under the hood multiQuery will either you batch db feature or https://docs.janusgraph.org/configs/configuration-reference/#storageparallel-backend-executor-service

In case you are trying to find a good executor service I would suggest to look at scalable executor service like https://github.com/elastic/elasticsearch/blob/dfac67aff0ca126901d72ed7fe862a1e7adb19b0/server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java#L74-L81
or similar executor services. I wouldn't recommend using executor services without upper bound limit like Cached thread pool because they are quite dangerous.

Hope it helps somehow.

Best regards,
Oleksandr


Re: Janusgraph Schema dump

hadoopmarc@...
 

Hi Pawan,

1. See https://docs.janusgraph.org/schema/#displaying-schema-information
2. See the manuals of your storage and indexing backends (and older questions on this list)
3. Please elaborate; somehow your question does not make sense to me

Best wishes,

Marc


Re: Janusgraph Schema dump

Pawan Shriwas
 

adding one more points  

 3. How can we get the mapped properties in a label or all labels


On Thu, Oct 21, 2021 at 7:46 PM Pawan Shriwas <shriwas.pawan@...> wrote:
Hi All,

Is any one let me know how to do below two items in janusgraph.

1. Database Schema dump (Want to use same schema dump on another env using same export) 
2. Database data dump for backup and restore case.

Thanks,
Pawan


--
Thanks & Regard

PAWAN SHRIWAS


Janusgraph Schema dump

Pawan Shriwas
 

Hi All,

Is any one let me know how to do below two items in janusgraph.

1. Database Schema dump (Want to use same schema dump on another env using same export) 
2. Database data dump for backup and restore case.

Thanks,
Pawan


Re: Options for Bulk Read/Bulk Export

subbu165@...
 

So currently we have JanusGraph with the storage back-end as FDB and use ElasticSearch for indexing. 
 
First we get the vertexIDs indexes from Elasticsearch back-end and then below is what we do 
JanusGraph graph = JanusGraphFactory.open(janusConfig);
Vertex vertex = graph.vertices(vertexId).next(); 
 
All the above including getting the vertexid indexes from Elasticsearch happens within the spark context using sparkRDD for partition and parallelisation. If we remove spark out of the equation, what else best way I can do bulkExport?
Also @oleksandr, you have stated that "Otherwise, additional calls might be executed to your backend which could be not as efficient." how should we do these additional calls and get subsequent records. Lets say I'm exporting 10M records and our cache/memory size doesn't support that much, so first I retrieve 1 to 1M records and then 1M to 2M, then 2M to 3M and so on, how can we iterate this way? how can this be achieved in Janus, Please throw some light


Re: Queries with negated text predicates fail with lucene

hadoopmarc@...
 

Hi Toom,

Yes, you are right, this behavior is not 100% consistent. Also, as noted, the documentation regarding text predicates on properties without index is incomplete. Use cases are sparse, though, because on a graph of practical size, working without index is not an option. Finally, improving this in a backward compatible way might prove impossible.

Best wishes,

Marc