Date   

Re: Index stuck on INSTALLED (single instance of JanusGraph)

schwartz@...
 

It seems that I had lots of instances registered in the cluster, probably due to shutdowns.
I got the list by using mgmt.getOpenInstances().toList()

I closed all instances expect for the current one, and committed, hoping that this would move the index status to REGISTERED.
Yet, nothing happens


Index stuck on INSTALLED (single instance of JanusGraph)

schwartz@...
 

I tried adding a composite index based on 2 existing properties.
As far as I understand, the initial stated is INSTALLED, then after all instances become aware of it, it should be REGISTERED.
Only then, I should re-index to make the index ENABLED.

My index remains INSTALLED. The JanusGraph server has no other instances (GKE deployment, with just 1 replica).
What needs to be done for the index to transition from INSTALLED to REGISTERED?

Many thanks!
Assaf


Re: How to replay a transaction log from the begining

Boxuan Li
 

Hi Ojas,

Ideally, by using Instant.now() to add your log processor, you should be able to see your callback invoked as soon as the transaction completes (if you are using a single in-memory storage backend), or with a minimal delay (depending on the read latency of your storage backend).

The time difference in your log looks a bit weird to me. Can you check if there is a clock drift among your servers?

Best,
Boxuan

「ojas.dubey via lists.lfaidata.foundation <ojas.dubey=amdocs.com@...>」在 2021年6月30日 週三,下午10:12 寫道:

Hi Boxuan,

Thanks. This indeed helped. Initially nothing happened (or at least it appeared that way) so I changed the start time to EPOCH and left the application running for a while and after sometime the callback was executed. 

So I was wondering how the log processor uses the start time value to replay the log and why did it take a long time to replay the logs. Is there a way by which I can reduce the time by setting the correct UTC time to start time (as i dont want to use EPOCH everytime) so that the callback is executed immediately?

Also is there a difference in the values of Instant.now() used by ReadMarker vs the actual local time used by the applicatioon because the ReadMarker initialization logs showed a different time. e.g.

2021-06-30T13:21:32.003+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@4051e47b
2021-06-30T13:21:32.008+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6a332fb7
2021-06-30T13:21:32.013+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@1f2cf847
2021-06-30T13:21:32.015+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@3cc2c61b

while the application log shows another time

2021-06-30T13:26:15.794+05:30 INFO |InternalEventLogger||c.a.o.s.b.s.i.Test|Started tx standardjanusgraphtx[0x39c4068c] for requestId 5ba073c8-68c2-4356-8097-2e62ef56299a and batchId 9632dceb-7996-4464-91d8-1b157fc8ca00


Regards,
Ojas 


Re: Indexing on sub-attribute of custom data type

hadoopmarc@...
 

Regarding documentation on custom attributes: Jason Plurad published an example project a few years ago (so, for an older JanusGraph version).

See, https://github.com/pluradj/janusgraph-attribute-serializer


Re: Indexing on sub-attribute of custom data type

hadoopmarc@...
 

Hi Ronnie,

Actually, "creating an associated vertex which defines this custom data type" sounds like an excellent idea! If an attribute is important enough to define an index on, it probably deserves to be a first class citizen in the graph.

Answers to the other questions:

  1.     Not for the MixedIndex; it does not support Object type keys (https://docs.janusgraph.org/index-backend/search-predicates/#data-type-support). A CompositeIndex index on an Object type key is possible, but it would still be an ugly approach because the index would compare entire objects based on the implemented "equals" method of the custom attribute (which in your case would compare one specific attribute).
  2.     I tried to test this in an example, but got stuck (for now) on the limited documentation given in https://docs.janusgraph.org/advanced-topics/serializer/. Otherwise, there is no reason why these cardinalities would not be supported. If you are still interested, we can try and get this working (and add it to the documentation).
Best wishes,

Marc


Indexing on sub-attribute of custom data type

Ronnie
 

Hi,
Few questions related to custom data types (https://docs.janusgraph.org/basics/common-questions/#custom-class-datatype)
1. Is it possible to index on a sub-attribute of a custom data type? If not, is there any other alternative other than creating an associated vertex which defines this custom data type?
2. Is attribute cardinality like SET / LIST supported with custom data type?

Thanks,
Ronnie


Re: Union with Count returning unexpected results

hadoopmarc@...
 

Hi Vinayak,

I guess this has to do with differences in lazy vs eager evaluation between the two queries. The TinkerPop ref docs reference the aggregated values with cap('ACount','E1Count','BCount','E2Count','CCount'), rather than with select(), to force eager evaluation, see: https://tinkerpop.apache.org/docs/current/reference/#store-step

Best wishes,    Marc

For other readers, please find the queries from the original post in a better readable format:

g2.inject(1).union(
  V().has('title', 'A').aggregate('v1').union(
    outE().has('title', 'E1').aggregate('e').inV().has('title', 'B'),
    outE().has('title', 'E2').aggregate('e').inV().has('title','C')
    ).aggregate('v2')
  ).
  select('v1').dedup().as('sourceCount').
  select('e').dedup().as('edgeCount').
  select('v2').dedup().as('destinationCount').
  select('sourceCount','edgeCount','destinationCount').by(unfold().count())


g2.inject(1).union(
  V().has('title', 'A').aggregate('A').union(
    outE().has('title', 'E1').aggregate('E1').inV().has('title', 'B').aggregate('B'),
    outE().has('title', 'E2').aggregate('E2').inV().has('title','C').aggregate('C')
    )
  ).
  select('A').dedup().as('ACount').
  select('E1').dedup().as('E1Count').
  select('B').dedup().as('BCount').
  select('E2').dedup().as('E2Count').
  select('C').dedup().as('CCount').
  select('ACount','E1Count','BCount','E2Count','CCount').by(unfold().count())


Re: How to replay a transaction log from the begining

ojas.dubey@...
 

Hi Boxuan,

Thanks. This indeed helped. Initially nothing happened (or at least it appeared that way) so I changed the start time to EPOCH and left the application running for a while and after sometime the callback was executed. 

So I was wondering how the log processor uses the start time value to replay the log and why did it take a long time to replay the logs. Is there a way by which I can reduce the time by setting the correct UTC time to start time (as i dont want to use EPOCH everytime) so that the callback is executed immediately?

Also is there a difference in the values of Instant.now() used by ReadMarker vs the actual local time used by the applicatioon because the ReadMarker initialization logs showed a different time. e.g.

2021-06-30T13:21:32.003+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@4051e47b
2021-06-30T13:21:32.008+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6a332fb7
2021-06-30T13:21:32.013+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@1f2cf847
2021-06-30T13:21:32.015+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@3cc2c61b

while the application log shows another time

2021-06-30T13:26:15.794+05:30 INFO |InternalEventLogger||c.a.o.s.b.s.i.Test|Started tx standardjanusgraphtx[0x39c4068c] for requestId 5ba073c8-68c2-4356-8097-2e62ef56299a and batchId 9632dceb-7996-4464-91d8-1b157fc8ca00


Regards,
Ojas 


Re: How to replay a transaction log from the begining

Boxuan Li
 

Hi Ojas,

Your `startLogProcessor` method looks good to me. I suspect that you are not using the transaction returned in step 1 to do the vertex/edge operations. In step 2, you are using `g.addV` which automatically starts a new anonymous transaction. To commit using that transaction, you will do `g.tx().commit()`, and of course, it will not be captured by your log processor. Therefore, you need to make sure you are using the transaction associated with the log processor to do the mutations.

Try replacing `g` with `tx.traversal()` where `tx` is returned in step 1. Then, your code should look like this:

JanusGraphTransaction tx = startJanusGraphTransaction(identifier);
tx.traversal().addV("idVertex").property(id, "uuid").next();
tx.commit();

Hope this helps.

Best,
Boxuan
 


Re: How to replay a transaction log from the begining

ojas.dubey@...
 

Hi Boxuan,
Please find the code below:

1. Starting the transaction (identifier value is TestBatchLogger)

public JanusGraphTransaction startJanusGraphTransaction(String identifier) {
    return janusGraphSchema.getConfiguredGraph().buildTransaction().logIdentifier(identifier).start();
}

2. Multiple add vertex/edge operations on the graph through  (e.g.) 

GraphTraversal<Vertex, Vertex> traversal = g.addV("idVertex")
.property(id, "uuid");
return traversal.next();

Here g is the gremlin GraphTraversalSource object obtained from JanusGraphFactory.open(<graphConfigPropertiesFile>).traversal()

3. Commit on the transaction object returned by the start transaction method.

So I wanted to replay the logs of this transaction. For this I made a call to the below method

public void startLogProcessor(String identifier) {
LogProcessorFramework logProcessor =
JanusGraphFactory.openTransactionLog(graph);
logProcessor.addLogProcessor(identifier).
setProcessorIdentifier("BatchTxLogger").
setStartTime(Instant.now()).
addProcessor((tx, txId, changeState) -> {
System.out.println("tx--"+tx.toString() + "  txId--"+txId.toString()
+"  changeState--"+changeState.toString());
for (JanusGraphVertex v : changeState.getVertices(Change.ANY)) {
System.out.println(v.label());
}
}).build();
}

But here I am unable to get the sysout. Tried different combinations of startTime (Instance.EPOCH, Instance.now().minusMillis(500) etc.) but did not get the println output on the console (No exception or error in any case).
I also tried removing the identifier which gave the invalid readmarker error. So after checking the class files I also removed the start time to resolve the error. But still no console output :(

Regards,
Ojas


Re: How to replay a transaction log from the begining

Boxuan Li
 

Hi Ojas,

Can you share your code and explain what you mean by "unable to work"? Is it running but not producing results as you expected, or encountering errors/exceptions?

Best,
Boxuan


Re: How to replay a transaction log from the begining

ojas.dubey@...
 

Hi,
 
Was wondering if this had been implemented.
 
I am running JanusGraph over Cassandra and was trying to work with the transaction log feature using the provided documentation.
 
So far I have managed to start the transaction with the identifier (the ulog tables are created in cassandra) but am still unable to get the Java callback to work. Have browsed through some threads here as well but still not able to get it to work.
 
Any help is appreciated.
 
 
Regards,
Ojas


Union with Count returning unexpected results

Vinayak Bali
 

Hi All, 

The objective is to count the number of nodes and edges. 

Query:
g2.inject(1).union(V().has('title', 'A').aggregate('v1').union(outE().has('title', 'E1').aggregate('e').inV().has('title', 'B'),outE().has('title', 'E2').aggregate('e').inV().has('title', 'C')).aggregate('v2')).select('v1').dedup().as('sourceCount').select('e').dedup().as('edgeCount').select('v2').dedup().as('destinationCount').select('sourceCount','edgeCount','destinationCount').by(unfold().count())

[
    {
        "sourceCount": 1203,
        "edgeCount": 9922,
        "destinationCount": 9926
    }
]

But when the aggregate query is placed inside the union query to count each type of node for the inner union the results are different.

Query: 
g2.inject(1).union(V().has('title', 'A').aggregate('A').union(outE().has('title', 'E1').aggregate('E1').inV().has('title', 'B').aggregate('B'),outE().has('title', 'E2').aggregate('E2').inV().has('title', 'C').aggregate('C'))).select('A').dedup().as('ACount').select('E1').dedup().as('E1Count').select('B').dedup().as('BCount').select('E2').dedup().as('E2Count').select('C').dedup().as('CCount').select('ACount','E1Count','BCount','E2Count','CCount').by(unfold().count())

[
    {
        "vendorCount": 1203,
        "supply1Count": 4,
        "productCount": 4,
        "supplyCount": 0,
        "materialCount": 0
    }
]

The nodes and edges count doesn't match after applying a small change. Request you take a look and share your thoughts.

Thanks & Regards,
Vinayak




Re: Dynamic control of graph configuration

Boxuan Li
 

Have you tried https://docs.janusgraph.org/basics/transactions/#transaction-configuration ? It allows you to enable/disable storage.batch-loading per transaction. 


Re: Dynamic control of graph configuration

hadoopmarc@...
 

Hi Frederick,

This is a good question, but I have no answer. For myself, I have always take a fresh janusgraph instance if I wanted to change one of the properties!

Best wishes,    Marc


Re: Avoiding duplicate vertex creation using unique indices

hadoopmarc@...
 

Hi Umesh,

I read this yesterday and thought your reasoning was sound, but at the same time it seemed unlikely it was in the ref docs for no reason. Just now, a scenario appeared to me where both locks are relevant, but actually this scenario speaks in favor of your approach! If you take a lock on both the property key and the index, in the case of parallel transactions one transaction could get the lock on the property key and the other on the index (hopefully other mechanisms prevent both transactions to fail). If you want to pursue this matter, you will have to investigate which scenarios are covered by tests in the janusgraph gitrepo and try to introduce a failing test.

I also thought about threaded transactions on a single janusgraph instance where in one transaction two threads try to add a name to the same vertex, but that scenario should be handled by the cardinality of the property.

Best wishes,

Marc


Re: Failed to connect to a Cassandra cluster's all nodes without default port

zhouyu74748585@...
 

It works well with version 0.6.0-SNAPSHOT. 
hope the realease version come soon


Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

Sorry I misunderstood your problem. I thought you have a cluster with different transport ports, which is not supported prior to Cassandra 4.

Turns out your cluster is using a uniform non-default transport port for every host. I am not 100% sure but seems this is fixed in Datastax java driver 4, which is included in JanusGraph 0.6.0. Can you try the master version of JanusGraph and see if you still have this problem?

On Jun 24, 2021, at 10:50 AM, Boxuan Li via lists.lfaidata.foundation <liboxuan=connect.hku.hk@...> wrote:

Got it. It does not seem to be a JanusGraph problem. I didn’t dig deep into it but seems it’s a limitation of Cassandra. See https://datastax-oss.atlassian.net/browse/JAVA-1388#icft=JAVA-1388 and other tickets mentioned there.

Right now I would suggest you avoid using different native transport ports in the same Cassandra cluster.

On Jun 23, 2021, at 3:01 PM, zhouyu74748585@... wrote:

Hi,
I am sorry to make you confused. I get the line number 220 from a .class file,  not a .java file.
the line number in java file is 246. you can read it from the screenshot
but I don't mean this line case the issue,just because  I this is a good position to set the port to replace the default port.



Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

Got it. It does not seem to be a JanusGraph problem. I didn’t dig deep into it but seems it’s a limitation of Cassandra. See https://datastax-oss.atlassian.net/browse/JAVA-1388#icft=JAVA-1388 and other tickets mentioned there.

Right now I would suggest you avoid using different native transport ports in the same Cassandra cluster.

On Jun 23, 2021, at 3:01 PM, zhouyu74748585@... wrote:

Hi,
I am sorry to make you confused. I get the line number 220 from a .class file,  not a .java file.
the line number in java file is 246. you can read it from the screenshot
but I don't mean this line case the issue,just because  I this is a good position to set the port to replace the default port.


Avoiding duplicate vertex creation using unique indices

Umesh Gade <er.umeshgade@...>
 

Hi All,
        To avoid a situation of duplicate vertex creation due to parallel transactions, we are using index uniqueness over property which defines uniqueness of vertex. As per doc, we need to specify lock on index and property. (https://docs.janusgraph.org/advanced-topics/eventual-consistency/)
mgmt.setConsistency(name, ConsistencyModifier.LOCK) // Ensures only one name per vertex mgmt.setConsistency(index, ConsistencyModifier.LOCK) // Ensures name uniqueness in the graph

As per observation, specifying lock only on index blocks parallel transaction commit which avoids duplicate vertex creation. We didn't see any behavior change with or without lock on property.

Can anybody help me understand the significance of lock on property i.e. name in above example? 
Any example scenario to understand the meaning of "Ensures only one name per vertex" ? 

--
Sincerely,
Umesh Gade