Date   

Re: Union with Count returning unexpected results

hadoopmarc@...
 

Hi Vinayak,

I guess this has to do with differences in lazy vs eager evaluation between the two queries. The TinkerPop ref docs reference the aggregated values with cap('ACount','E1Count','BCount','E2Count','CCount'), rather than with select(), to force eager evaluation, see: https://tinkerpop.apache.org/docs/current/reference/#store-step

Best wishes,    Marc

For other readers, please find the queries from the original post in a better readable format:

g2.inject(1).union(
  V().has('title', 'A').aggregate('v1').union(
    outE().has('title', 'E1').aggregate('e').inV().has('title', 'B'),
    outE().has('title', 'E2').aggregate('e').inV().has('title','C')
    ).aggregate('v2')
  ).
  select('v1').dedup().as('sourceCount').
  select('e').dedup().as('edgeCount').
  select('v2').dedup().as('destinationCount').
  select('sourceCount','edgeCount','destinationCount').by(unfold().count())


g2.inject(1).union(
  V().has('title', 'A').aggregate('A').union(
    outE().has('title', 'E1').aggregate('E1').inV().has('title', 'B').aggregate('B'),
    outE().has('title', 'E2').aggregate('E2').inV().has('title','C').aggregate('C')
    )
  ).
  select('A').dedup().as('ACount').
  select('E1').dedup().as('E1Count').
  select('B').dedup().as('BCount').
  select('E2').dedup().as('E2Count').
  select('C').dedup().as('CCount').
  select('ACount','E1Count','BCount','E2Count','CCount').by(unfold().count())


Re: How to replay a transaction log from the begining

ojas.dubey@...
 

Hi Boxuan,

Thanks. This indeed helped. Initially nothing happened (or at least it appeared that way) so I changed the start time to EPOCH and left the application running for a while and after sometime the callback was executed. 

So I was wondering how the log processor uses the start time value to replay the log and why did it take a long time to replay the logs. Is there a way by which I can reduce the time by setting the correct UTC time to start time (as i dont want to use EPOCH everytime) so that the callback is executed immediately?

Also is there a difference in the values of Instant.now() used by ReadMarker vs the actual local time used by the applicatioon because the ReadMarker initialization logs showed a different time. e.g.

2021-06-30T13:21:32.003+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@4051e47b
2021-06-30T13:21:32.008+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6a332fb7
2021-06-30T13:21:32.013+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@1f2cf847
2021-06-30T13:21:32.015+05:30 INFO |InternalEventLogger|||||||o.j.diskstorage.log.kcvs.KCVSLog|Loaded identified ReadMarker start time 2021-06-30T04:00:00Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@3cc2c61b

while the application log shows another time

2021-06-30T13:26:15.794+05:30 INFO |InternalEventLogger||c.a.o.s.b.s.i.Test|Started tx standardjanusgraphtx[0x39c4068c] for requestId 5ba073c8-68c2-4356-8097-2e62ef56299a and batchId 9632dceb-7996-4464-91d8-1b157fc8ca00


Regards,
Ojas 


Re: How to replay a transaction log from the begining

Boxuan Li
 

Hi Ojas,

Your `startLogProcessor` method looks good to me. I suspect that you are not using the transaction returned in step 1 to do the vertex/edge operations. In step 2, you are using `g.addV` which automatically starts a new anonymous transaction. To commit using that transaction, you will do `g.tx().commit()`, and of course, it will not be captured by your log processor. Therefore, you need to make sure you are using the transaction associated with the log processor to do the mutations.

Try replacing `g` with `tx.traversal()` where `tx` is returned in step 1. Then, your code should look like this:

JanusGraphTransaction tx = startJanusGraphTransaction(identifier);
tx.traversal().addV("idVertex").property(id, "uuid").next();
tx.commit();

Hope this helps.

Best,
Boxuan
 


Re: How to replay a transaction log from the begining

ojas.dubey@...
 

Hi Boxuan,
Please find the code below:

1. Starting the transaction (identifier value is TestBatchLogger)

public JanusGraphTransaction startJanusGraphTransaction(String identifier) {
    return janusGraphSchema.getConfiguredGraph().buildTransaction().logIdentifier(identifier).start();
}

2. Multiple add vertex/edge operations on the graph through  (e.g.) 

GraphTraversal<Vertex, Vertex> traversal = g.addV("idVertex")
.property(id, "uuid");
return traversal.next();

Here g is the gremlin GraphTraversalSource object obtained from JanusGraphFactory.open(<graphConfigPropertiesFile>).traversal()

3. Commit on the transaction object returned by the start transaction method.

So I wanted to replay the logs of this transaction. For this I made a call to the below method

public void startLogProcessor(String identifier) {
LogProcessorFramework logProcessor =
JanusGraphFactory.openTransactionLog(graph);
logProcessor.addLogProcessor(identifier).
setProcessorIdentifier("BatchTxLogger").
setStartTime(Instant.now()).
addProcessor((tx, txId, changeState) -> {
System.out.println("tx--"+tx.toString() + "  txId--"+txId.toString()
+"  changeState--"+changeState.toString());
for (JanusGraphVertex v : changeState.getVertices(Change.ANY)) {
System.out.println(v.label());
}
}).build();
}

But here I am unable to get the sysout. Tried different combinations of startTime (Instance.EPOCH, Instance.now().minusMillis(500) etc.) but did not get the println output on the console (No exception or error in any case).
I also tried removing the identifier which gave the invalid readmarker error. So after checking the class files I also removed the start time to resolve the error. But still no console output :(

Regards,
Ojas


Re: How to replay a transaction log from the begining

Boxuan Li
 

Hi Ojas,

Can you share your code and explain what you mean by "unable to work"? Is it running but not producing results as you expected, or encountering errors/exceptions?

Best,
Boxuan


Re: How to replay a transaction log from the begining

ojas.dubey@...
 

Hi,
 
Was wondering if this had been implemented.
 
I am running JanusGraph over Cassandra and was trying to work with the transaction log feature using the provided documentation.
 
So far I have managed to start the transaction with the identifier (the ulog tables are created in cassandra) but am still unable to get the Java callback to work. Have browsed through some threads here as well but still not able to get it to work.
 
Any help is appreciated.
 
 
Regards,
Ojas


Union with Count returning unexpected results

Vinayak Bali
 

Hi All, 

The objective is to count the number of nodes and edges. 

Query:
g2.inject(1).union(V().has('title', 'A').aggregate('v1').union(outE().has('title', 'E1').aggregate('e').inV().has('title', 'B'),outE().has('title', 'E2').aggregate('e').inV().has('title', 'C')).aggregate('v2')).select('v1').dedup().as('sourceCount').select('e').dedup().as('edgeCount').select('v2').dedup().as('destinationCount').select('sourceCount','edgeCount','destinationCount').by(unfold().count())

[
    {
        "sourceCount": 1203,
        "edgeCount": 9922,
        "destinationCount": 9926
    }
]

But when the aggregate query is placed inside the union query to count each type of node for the inner union the results are different.

Query: 
g2.inject(1).union(V().has('title', 'A').aggregate('A').union(outE().has('title', 'E1').aggregate('E1').inV().has('title', 'B').aggregate('B'),outE().has('title', 'E2').aggregate('E2').inV().has('title', 'C').aggregate('C'))).select('A').dedup().as('ACount').select('E1').dedup().as('E1Count').select('B').dedup().as('BCount').select('E2').dedup().as('E2Count').select('C').dedup().as('CCount').select('ACount','E1Count','BCount','E2Count','CCount').by(unfold().count())

[
    {
        "vendorCount": 1203,
        "supply1Count": 4,
        "productCount": 4,
        "supplyCount": 0,
        "materialCount": 0
    }
]

The nodes and edges count doesn't match after applying a small change. Request you take a look and share your thoughts.

Thanks & Regards,
Vinayak




Re: Dynamic control of graph configuration

Boxuan Li
 

Have you tried https://docs.janusgraph.org/basics/transactions/#transaction-configuration ? It allows you to enable/disable storage.batch-loading per transaction. 


Re: Dynamic control of graph configuration

hadoopmarc@...
 

Hi Frederick,

This is a good question, but I have no answer. For myself, I have always take a fresh janusgraph instance if I wanted to change one of the properties!

Best wishes,    Marc


Re: Avoiding duplicate vertex creation using unique indices

hadoopmarc@...
 

Hi Umesh,

I read this yesterday and thought your reasoning was sound, but at the same time it seemed unlikely it was in the ref docs for no reason. Just now, a scenario appeared to me where both locks are relevant, but actually this scenario speaks in favor of your approach! If you take a lock on both the property key and the index, in the case of parallel transactions one transaction could get the lock on the property key and the other on the index (hopefully other mechanisms prevent both transactions to fail). If you want to pursue this matter, you will have to investigate which scenarios are covered by tests in the janusgraph gitrepo and try to introduce a failing test.

I also thought about threaded transactions on a single janusgraph instance where in one transaction two threads try to add a name to the same vertex, but that scenario should be handled by the cardinality of the property.

Best wishes,

Marc


Re: Failed to connect to a Cassandra cluster's all nodes without default port

zhouyu74748585@...
 

It works well with version 0.6.0-SNAPSHOT. 
hope the realease version come soon


Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

Sorry I misunderstood your problem. I thought you have a cluster with different transport ports, which is not supported prior to Cassandra 4.

Turns out your cluster is using a uniform non-default transport port for every host. I am not 100% sure but seems this is fixed in Datastax java driver 4, which is included in JanusGraph 0.6.0. Can you try the master version of JanusGraph and see if you still have this problem?

On Jun 24, 2021, at 10:50 AM, Boxuan Li via lists.lfaidata.foundation <liboxuan=connect.hku.hk@...> wrote:

Got it. It does not seem to be a JanusGraph problem. I didn’t dig deep into it but seems it’s a limitation of Cassandra. See https://datastax-oss.atlassian.net/browse/JAVA-1388#icft=JAVA-1388 and other tickets mentioned there.

Right now I would suggest you avoid using different native transport ports in the same Cassandra cluster.

On Jun 23, 2021, at 3:01 PM, zhouyu74748585@... wrote:

Hi,
I am sorry to make you confused. I get the line number 220 from a .class file,  not a .java file.
the line number in java file is 246. you can read it from the screenshot
but I don't mean this line case the issue,just because  I this is a good position to set the port to replace the default port.



Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

Got it. It does not seem to be a JanusGraph problem. I didn’t dig deep into it but seems it’s a limitation of Cassandra. See https://datastax-oss.atlassian.net/browse/JAVA-1388#icft=JAVA-1388 and other tickets mentioned there.

Right now I would suggest you avoid using different native transport ports in the same Cassandra cluster.

On Jun 23, 2021, at 3:01 PM, zhouyu74748585@... wrote:

Hi,
I am sorry to make you confused. I get the line number 220 from a .class file,  not a .java file.
the line number in java file is 246. you can read it from the screenshot
but I don't mean this line case the issue,just because  I this is a good position to set the port to replace the default port.


Avoiding duplicate vertex creation using unique indices

Umesh Gade <er.umeshgade@...>
 

Hi All,
        To avoid a situation of duplicate vertex creation due to parallel transactions, we are using index uniqueness over property which defines uniqueness of vertex. As per doc, we need to specify lock on index and property. (https://docs.janusgraph.org/advanced-topics/eventual-consistency/)
mgmt.setConsistency(name, ConsistencyModifier.LOCK) // Ensures only one name per vertex mgmt.setConsistency(index, ConsistencyModifier.LOCK) // Ensures name uniqueness in the graph

As per observation, specifying lock only on index blocks parallel transaction commit which avoids duplicate vertex creation. We didn't see any behavior change with or without lock on property.

Can anybody help me understand the significance of lock on property i.e. name in above example? 
Any example scenario to understand the meaning of "Ensures only one name per vertex" ? 

--
Sincerely,
Umesh Gade


Re: Failed to connect to a Cassandra cluster's all nodes without default port

zhouyu74748585@...
 

Hi,
I am sorry to make you confused. I get the line number 220 from a .class file,  not a .java file.
the line number in java file is 246. you can read it from the screenshot
but I don't mean this line case the issue,just because  I this is a good position to set the port to replace the default port.


Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

I am a bit confused here.


(NetworkUtil.isLocalConnection(hostnames[0])) ? Deployment.LOCAL : Deployment.REMOTE


Are we looking at the same line here?
On Jun 22, 2021, at 5:32 PM, zhouyu74748585@... wrote:

The janusgraph will try to connect with 192.168.223.3:9042

I find the codes in janusgraph-cql-0.5.3.jar,

the codes is:

 

final Builder builder = Cluster.builder()

        .addContactPointsWithPorts(contactPoints)

        .withClusterName(configuration.get(CLUSTER_NAME));

        //.withPort(configuration.get(CLUSTER_PORT));

 

in branch V0.5  fed8439

I append the last line to solve this problem,but it only work when all the host have same port;



Re: Count Query

Vinayak Bali
 

I have gone through the commit, I think that will help. To confirm we need to test it with the master branch as it has been not released yet. Earlier, we downloaded the full version from the releases page(janusgraph-full-0.5.2) and moved forward. I am not sure how I can test the queries on the master branch. Please share the setups or any document regarding the same. Thank you.


On Fri, Jun 18, 2021 at 11:59 PM <owner.mad.epa@...> wrote:
Could you check your query on master branch?Possibly this https://github.com/JanusGraph/janusgraph/commit/7550033d1746d0844bac79e3a8b85685c2c6c79d fix improve this type of query


Re: Failed to connect to a Cassandra cluster's all nodes without default port

Ronnie
 

Yes. We face same issue when using custom ports with C* backend. The first host:port in storage.hostname is processed correctly. For the remainder of  comma separated host:port, JanusGraph tried to connect to those servers with the default port of 9042. We had this problem since JanusGraph 0.4.x was hoping this would be fixed by 0.5.3, but this issue still exist.


Re: Failed to connect to a Cassandra cluster's all nodes without default port

zhouyu74748585@...
 

The janusgraph will try to connect with 192.168.223.3:9042

I find the codes in janusgraph-cql-0.5.3.jar,

the codes is:

 

final Builder builder = Cluster.builder()

        .addContactPointsWithPorts(contactPoints)

        .withClusterName(configuration.get(CLUSTER_NAME));

        //.withPort(configuration.get(CLUSTER_PORT));

 

in branch V0.5  fed8439

I append the last line to solve this problem,but it only work when all the host have same port;


Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

What happens when you only use “192.168.223.3” as your storage.hostname, and don’t set storage.port?

I am also not sure about what version of line 220 you are looking at. What is the commit hash? You can also just copy the code snippet here.

「<zhouyu74748585@...>」在 2021年6月22日 週二,上午9:42 寫道:

Hi,
the version is 0.53,and the line is line 220 in org.janusgraph.diskstorage.cql.CQLStoreManager
The cluster builder's port is default 9042.
when I set one host,the logs are the same.if I set three hosts,janusgraph only chose one to connect to cassandra cluster。
janusgraph gets other ips from cluster's metadata.but use the default port 9042 in the local 
cluster object,then the local cluster try to connect to the host with correct ip and default port.
 

81 - 100 of 6064