Re: Index stuck on INSTALLED (single instance of JanusGraph)
schwartz@...
It seems that I had lots of instances registered in the cluster, probably due to shutdowns.
I got the list by using mgmt.getOpenInstances().toList() I closed all instances expect for the current one, and committed, hoping that this would move the index status to REGISTERED. Yet, nothing happens |
|
Index stuck on INSTALLED (single instance of JanusGraph)
schwartz@...
I tried adding a composite index based on 2 existing properties.
As far as I understand, the initial stated is INSTALLED, then after all instances become aware of it, it should be REGISTERED. Only then, I should re-index to make the index ENABLED. My index remains INSTALLED. The JanusGraph server has no other instances (GKE deployment, with just 1 replica). What needs to be done for the index to transition from INSTALLED to REGISTERED? Many thanks! Assaf |
|
Re: How to replay a transaction log from the begining
Boxuan Li
Hi Ojas, Ideally, by using Instant.now() to add your log processor, you should be able to see your callback invoked as soon as the transaction completes (if you are using a single in-memory storage backend), or with a minimal delay (depending on the read latency of your storage backend). The time difference in your log looks a bit weird to me. Can you check if there is a clock drift among your servers? Best, Boxuan 「ojas.dubey via lists.lfaidata.foundation <ojas.dubey=amdocs.com@...>」在 2021年6月30日 週三,下午10:12 寫道: Hi Boxuan, |
|
Re: Indexing on sub-attribute of custom data type
hadoopmarc@...
Regarding documentation on custom attributes: Jason Plurad published an example project a few years ago (so, for an older JanusGraph version).
See, https://github.com/pluradj/janusgraph-attribute-serializer |
|
Re: Indexing on sub-attribute of custom data type
hadoopmarc@...
Hi Ronnie,
Actually, "creating an associated vertex which defines this custom data type" sounds like an excellent idea! If an attribute is important enough to define an index on, it probably deserves to be a first class citizen in the graph. Answers to the other questions:
Marc |
|
Indexing on sub-attribute of custom data type
Ronnie
Hi,
Few questions related to custom data types (https://docs.janusgraph.org/basics/common-questions/#custom-class-datatype) 1. Is it possible to index on a sub-attribute of a custom data type? If not, is there any other alternative other than creating an associated vertex which defines this custom data type? 2. Is attribute cardinality like SET / LIST supported with custom data type? Thanks, Ronnie |
|
Re: Union with Count returning unexpected results
hadoopmarc@...
Hi Vinayak,
I guess this has to do with differences in lazy vs eager evaluation between the two queries. The TinkerPop ref docs reference the aggregated values with cap('ACount','E1Count','BCount','E2Count','CCount'), rather than with select(), to force eager evaluation, see: https://tinkerpop.apache.org/docs/current/reference/#store-step Best wishes, Marc For other readers, please find the queries from the original post in a better readable format: g2.inject(1).union( V().has('title', 'A').aggregate('v1').union( outE().has('title', 'E1').aggregate('e').inV().has('title', 'B'), outE().has('title', 'E2').aggregate('e').inV().has('title','C') ).aggregate('v2') ). select('v1').dedup().as('sourceCount'). select('e').dedup().as('edgeCount'). select('v2').dedup().as('destinationCount'). select('sourceCount','edgeCount','destinationCount').by(unfold().count()) g2.inject(1).union( V().has('title', 'A').aggregate('A').union( outE().has('title', 'E1').aggregate('E1').inV().has('title', 'B').aggregate('B'), outE().has('title', 'E2').aggregate('E2').inV().has('title','C').aggregate('C') ) ). select('A').dedup().as('ACount'). select('E1').dedup().as('E1Count'). select('B').dedup().as('BCount'). select('E2').dedup().as('E2Count'). select('C').dedup().as('CCount'). select('ACount','E1Count','BCount','E2Count','CCount').by(unfold().count()) |
|
Re: How to replay a transaction log from the begining
ojas.dubey@...
Hi Boxuan,
Thanks. This indeed helped. Initially nothing happened (or at least it appeared that way) so I changed the start time to EPOCH and left the application running for a while and after sometime the callback was executed. So I was wondering how the log processor uses the start time value to replay the log and why did it take a long time to replay the logs. Is there a way by which I can reduce the time by setting the correct UTC time to start time (as i dont want to use EPOCH everytime) so that the callback is executed immediately? Also is there a difference in the values of Instant.now() used by ReadMarker vs the actual local time used by the applicatioon because the ReadMarker initialization logs showed a different time. e.g. 2021-06-30T13:21:32.003+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@4051e47b
2021-06-30T13:21:32.008+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6a332fb7
2021-06-30T13:21:32.013+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@1f2cf847
2021-06-30T13:21:32.015+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@3cc2c61b
while the application log shows another time 2021-06-30T13:26:15.794+05:30 INFO |InternalEventLogger||c.a.o.s.b.s.i.Test|Started tx standardjanusgraphtx[0x39c4068c] for requestId 5ba073c8-68c2-4356-8097-2e62ef56299a and batchId 9632dceb-7996-4464-91d8-1b157fc8ca00 Regards, Ojas |
|
Re: How to replay a transaction log from the begining
Boxuan Li
Hi Ojas,
Your `startLogProcessor` method looks good to me. I suspect that you are not using the transaction returned in step 1 to do the vertex/edge operations. In step 2, you are using `g.addV` which automatically starts a new anonymous transaction. To commit using that transaction, you will do `g.tx().commit()`, and of course, it will not be captured by your log processor. Therefore, you need to make sure you are using the transaction associated with the log processor to do the mutations. Try replacing `g` with `tx.traversal()` where `tx` is returned in step 1. Then, your code should look like this: JanusGraphTransaction tx = startJanusGraphTransaction(identifier); Hope this helps. Best, Boxuan |
|
Re: How to replay a transaction log from the begining
ojas.dubey@...
Hi Boxuan,
Please find the code below: 1. Starting the transaction (identifier value is TestBatchLogger) public JanusGraphTransaction startJanusGraphTransaction(String identifier) {
return janusGraphSchema.getConfiguredGraph().buildTransaction().logIdentifier(identifier).start();
}
2. Multiple add vertex/edge operations on the graph through (e.g.) GraphTraversal<Vertex, Vertex> traversal = g.addV("idVertex")
.property(id, "uuid");
return traversal.next();
Here g is the gremlin GraphTraversalSource object obtained from JanusGraphFactory.open(<graphConfigPropertiesFile>).traversal() 3. Commit on the transaction object returned by the start transaction method. So I wanted to replay the logs of this transaction. For this I made a call to the below method public void startLogProcessor(String identifier) {
LogProcessorFramework logProcessor =
JanusGraphFactory.openTransactionLog(graph);
logProcessor.addLogProcessor(identifier).
setProcessorIdentifier("BatchTxLogger").
setStartTime(Instant.now()).
addProcessor((tx, txId, changeState) -> {
System.out.println("tx--"+tx.toString() + " txId--"+txId.toString()
+" changeState--"+changeState.toString());
for (JanusGraphVertex v : changeState.getVertices(Change.ANY)) {
System.out.println(v.label());
}
}).build();
} But here I am unable to get the sysout. Tried different combinations of startTime (Instance.EPOCH, Instance.now().minusMillis(500) etc.) but did not get the println output on the console (No exception or error in any case). I also tried removing the identifier which gave the invalid readmarker error. So after checking the class files I also removed the start time to resolve the error. But still no console output :( Regards, Ojas |
|
Re: How to replay a transaction log from the begining
Boxuan Li
Hi Ojas,
Can you share your code and explain what you mean by "unable to work"? Is it running but not producing results as you expected, or encountering errors/exceptions? Best, Boxuan |
|
Re: How to replay a transaction log from the begining
ojas.dubey@...
Hi,
Was wondering if this had been implemented.
I am running JanusGraph over Cassandra and was trying to work with the transaction log feature using the provided documentation.
So far I have managed to start the transaction with the identifier (the ulog tables are created in cassandra) but am still unable to get the Java callback to work. Have browsed through some threads here as well but still not able to get it to work.
Any help is appreciated.
Regards,
Ojas |
|
Union with Count returning unexpected results
Vinayak Bali
Hi All, The objective is to count the number of nodes and edges. Query: g2.inject(1).union(V().has('title', 'A').aggregate('v1').union(outE().has('title', 'E1').aggregate('e').inV().has('title', 'B'),outE().has('title', 'E2').aggregate('e').inV().has('title', 'C')).aggregate('v2')).select('v1').dedup().as('sourceCount').select('e').dedup().as('edgeCount').select('v2').dedup().as('destinationCount').select('sourceCount','edgeCount','destinationCount').by(unfold().count()) [ { "sourceCount": 1203, "edgeCount": 9922, "destinationCount": 9926 } ] But when the aggregate query is placed inside the union query to count each type of node for the inner union the results are different. Query: g2.inject(1).union(V().has('title', 'A').aggregate('A').union(outE().has('title', 'E1').aggregate('E1').inV().has('title', 'B').aggregate('B'),outE().has('title', 'E2').aggregate('E2').inV().has('title', 'C').aggregate('C'))).select('A').dedup().as('ACount').select('E1').dedup().as('E1Count').select('B').dedup().as('BCount').select('E2').dedup().as('E2Count').select('C').dedup().as('CCount').select('ACount','E1Count','BCount','E2Count','CCount').by(unfold().count()) [ { "vendorCount": 1203, "supply1Count": 4, "productCount": 4, "supplyCount": 0, "materialCount": 0 } ] The nodes and edges count doesn't match after applying a small change. Request you take a look and share your thoughts. Thanks & Regards, Vinayak |
|
Re: Dynamic control of graph configuration
Boxuan Li
Have you tried https://docs.janusgraph.org/basics/transactions/#transaction-configuration ? It allows you to enable/disable storage.batch-loading per transaction.
|
|
Re: Dynamic control of graph configuration
hadoopmarc@...
Hi Frederick,
This is a good question, but I have no answer. For myself, I have always take a fresh janusgraph instance if I wanted to change one of the properties! Best wishes, Marc |
|
Re: Avoiding duplicate vertex creation using unique indices
hadoopmarc@...
Hi Umesh,
I read this yesterday and thought your reasoning was sound, but at the same time it seemed unlikely it was in the ref docs for no reason. Just now, a scenario appeared to me where both locks are relevant, but actually this scenario speaks in favor of your approach! If you take a lock on both the property key and the index, in the case of parallel transactions one transaction could get the lock on the property key and the other on the index (hopefully other mechanisms prevent both transactions to fail). If you want to pursue this matter, you will have to investigate which scenarios are covered by tests in the janusgraph gitrepo and try to introduce a failing test. I also thought about threaded transactions on a single janusgraph instance where in one transaction two threads try to add a name to the same vertex, but that scenario should be handled by the cardinality of the property. Best wishes, Marc |
|
Re: Failed to connect to a Cassandra cluster's all nodes without default port
zhouyu74748585@...
It works well with version 0.6.0-SNAPSHOT.
hope the realease version come soon |
|
Re: Failed to connect to a Cassandra cluster's all nodes without default port
Boxuan Li
Sorry I misunderstood your problem. I thought you have a cluster with different transport ports, which is not supported prior to Cassandra 4.
toggle quoted message
Show quoted text
Turns out your cluster is using a uniform non-default transport port for every host. I am not 100% sure but seems this is fixed in Datastax java driver 4, which is included in JanusGraph 0.6.0. Can you try the master version of JanusGraph and see if you still have this problem?
|
|
Re: Failed to connect to a Cassandra cluster's all nodes without default port
Boxuan Li
Got it. It does not seem to be a JanusGraph problem. I didn’t dig deep into it but seems it’s a limitation of Cassandra. See https://datastax-oss.atlassian.net/browse/JAVA-1388#icft=JAVA-1388 and other tickets mentioned there.
toggle quoted message
Show quoted text
Right now I would suggest you avoid using different native transport ports in the same Cassandra cluster.
|
|
Avoiding duplicate vertex creation using unique indices
Umesh Gade <er.umeshgade@...>
Hi All, To avoid a situation of duplicate vertex creation due to parallel transactions, we are using index uniqueness over property which defines uniqueness of vertex. As per doc, we need to specify lock on index and property. (https://docs.janusgraph.org/advanced-topics/eventual-consistency/) mgmt.setConsistency(name, ConsistencyModifier.LOCK) // Ensures only one name per vertex
mgmt.setConsistency(index, ConsistencyModifier.LOCK) // Ensures name uniqueness in the graph As per observation, specifying lock only on index blocks parallel transaction commit which avoids duplicate vertex creation. We didn't see any behavior change with or without lock on property. Can anybody help me understand the significance of lock on property i.e. name in above example? Any example scenario to understand the meaning of "Ensures only one name per vertex" ? Sincerely, Umesh Gade |
|