Transactional operation in janus-graph through gremlin queries


"anj...@gmail.com" <anjani...@...>
 

Thanks Marc, for you help and time.

Regards,
Anjani

On Tuesday, 3 November 2020 at 21:45:33 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

No, existing schema elements cannot be modified (apart from renames), see:


Best wishes,     Marc

Op dinsdag 3 november 2020 om 11:08:00 UTC+1 schreef anj...@...:
Thanks Mark, i missed step 4. Thank you very much for pointing it.

One more question, Our graph is already running on prod and properties are defined but consistency is not set on them. 
If i add consistency modifier for existing properties,  will it be picked up?

Thanks,
Anjani

On Monday, 2 November 2020 at 21:41:18 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

See step 4 in the ref docs link I sent earlier: the locks are not released until the entire transaction is committed or rolled back.

Marc

Op maandag 2 november 2020 om 13:21:57 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your detailed response. My understanding is node is locked automatically during operation and get released after it, does not wait for commit.

 Suppose i need to update 3 nodes. I can write like as below. In this way if there is any exception for any of the node, will not commit and hence can control it. 
try {
    g.V(4104).property("NodeUpdatedDate", new Date()).next();
    g.V(4288).property("NodeUpdatedDate", new Date()).next();
    g.V(4188).property("NodeUpdatedDate"new Date()).next();
    g.tx().commit();
} catch (Exception e) {
//Recover, retry
}
  
With this 1st node V(4104) is locked by thread when update is happening, but it get released when update for other nodes V(4288), V(4188) happening, which mean other thread can update V(4104) before transaction is committed, which might result in data inconsistency.

I was thinking in some way acquire lock on all nodes before doing any operation on them some thing like :
g.V(4288).lock(),g.V(4104).lock(), g.V(4188).lock()
After locking explicitly, perform operations and unlock as part of commit.

Thanks,
Anjani

On Saturday, 31 October 2020 at 17:01:05 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

Do you mean that there are still (extremely rare) failure situations possible despite the use of locking and the use of JanusGraph transactions? I am not sure if I can think of one and it would depend on ill-timed failures in the backend (e.g. power failure). One thing to worry about and that you could properly test, is whether all mutations in the JanusGraph transaction are sent to the backend in a single network request (otherwise JanusGraph could have persisted two of the five nodes and then fail). There are various configuration properties that might influence this:

query.batch
storage.cql.atomic-batch-mutate
storage.cql.batch-statement-size

Also see the comments for the tx.log-tx property.

HTH,    Marc

Op vrijdag 30 oktober 2020 om 15:58:39 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


HadoopMarc <bi...@...>
 

Hi Anjani,

No, existing schema elements cannot be modified (apart from renames), see:

https://docs.janusgraph.org/basics/schema/#changing-schema-elements

Best wishes,     Marc

Op dinsdag 3 november 2020 om 11:08:00 UTC+1 schreef anj...@...:

Thanks Mark, i missed step 4. Thank you very much for pointing it.

One more question, Our graph is already running on prod and properties are defined but consistency is not set on them. 
If i add consistency modifier for existing properties,  will it be picked up?

Thanks,
Anjani

On Monday, 2 November 2020 at 21:41:18 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

See step 4 in the ref docs link I sent earlier: the locks are not released until the entire transaction is committed or rolled back.

Marc

Op maandag 2 november 2020 om 13:21:57 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your detailed response. My understanding is node is locked automatically during operation and get released after it, does not wait for commit.

 Suppose i need to update 3 nodes. I can write like as below. In this way if there is any exception for any of the node, will not commit and hence can control it. 
try {
    g.V(4104).property("NodeUpdatedDate", new Date()).next();
    g.V(4288).property("NodeUpdatedDate", new Date()).next();
    g.V(4188).property("NodeUpdatedDate"new Date()).next();
    g.tx().commit();
} catch (Exception e) {
//Recover, retry
}
  
With this 1st node V(4104) is locked by thread when update is happening, but it get released when update for other nodes V(4288), V(4188) happening, which mean other thread can update V(4104) before transaction is committed, which might result in data inconsistency.

I was thinking in some way acquire lock on all nodes before doing any operation on them some thing like :
g.V(4288).lock(),g.V(4104).lock(), g.V(4188).lock()
After locking explicitly, perform operations and unlock as part of commit.

Thanks,
Anjani

On Saturday, 31 October 2020 at 17:01:05 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

Do you mean that there are still (extremely rare) failure situations possible despite the use of locking and the use of JanusGraph transactions? I am not sure if I can think of one and it would depend on ill-timed failures in the backend (e.g. power failure). One thing to worry about and that you could properly test, is whether all mutations in the JanusGraph transaction are sent to the backend in a single network request (otherwise JanusGraph could have persisted two of the five nodes and then fail). There are various configuration properties that might influence this:

query.batch
storage.cql.atomic-batch-mutate
storage.cql.batch-statement-size

Also see the comments for the tx.log-tx property.

HTH,    Marc

Op vrijdag 30 oktober 2020 om 15:58:39 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


"anj...@gmail.com" <anjani...@...>
 

Thanks Mark, i missed step 4. Thank you very much for pointing it.

One more question, Our graph is already running on prod and properties are defined but consistency is not set on them. 
If i add consistency modifier for existing properties,  will it be picked up?

Thanks,
Anjani

On Monday, 2 November 2020 at 21:41:18 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

See step 4 in the ref docs link I sent earlier: the locks are not released until the entire transaction is committed or rolled back.

Marc

Op maandag 2 november 2020 om 13:21:57 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your detailed response. My understanding is node is locked automatically during operation and get released after it, does not wait for commit.

 Suppose i need to update 3 nodes. I can write like as below. In this way if there is any exception for any of the node, will not commit and hence can control it. 
try {
    g.V(4104).property("NodeUpdatedDate", new Date()).next();
    g.V(4288).property("NodeUpdatedDate", new Date()).next();
    g.V(4188).property("NodeUpdatedDate"new Date()).next();
    g.tx().commit();
} catch (Exception e) {
//Recover, retry
}
  
With this 1st node V(4104) is locked by thread when update is happening, but it get released when update for other nodes V(4288), V(4188) happening, which mean other thread can update V(4104) before transaction is committed, which might result in data inconsistency.

I was thinking in some way acquire lock on all nodes before doing any operation on them some thing like :
g.V(4288).lock(),g.V(4104).lock(), g.V(4188).lock()
After locking explicitly, perform operations and unlock as part of commit.

Thanks,
Anjani

On Saturday, 31 October 2020 at 17:01:05 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

Do you mean that there are still (extremely rare) failure situations possible despite the use of locking and the use of JanusGraph transactions? I am not sure if I can think of one and it would depend on ill-timed failures in the backend (e.g. power failure). One thing to worry about and that you could properly test, is whether all mutations in the JanusGraph transaction are sent to the backend in a single network request (otherwise JanusGraph could have persisted two of the five nodes and then fail). There are various configuration properties that might influence this:

query.batch
storage.cql.atomic-batch-mutate
storage.cql.batch-statement-size

Also see the comments for the tx.log-tx property.

HTH,    Marc

Op vrijdag 30 oktober 2020 om 15:58:39 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


HadoopMarc <bi...@...>
 

Hi Anjani,

See step 4 in the ref docs link I sent earlier: the locks are not released until the entire transaction is committed or rolled back.

Marc

Op maandag 2 november 2020 om 13:21:57 UTC+1 schreef anj...@...:

Hi Marc,

Thanks for your detailed response. My understanding is node is locked automatically during operation and get released after it, does not wait for commit.

 Suppose i need to update 3 nodes. I can write like as below. In this way if there is any exception for any of the node, will not commit and hence can control it. 
try {
    g.V(4104).property("NodeUpdatedDate", new Date()).next();
    g.V(4288).property("NodeUpdatedDate", new Date()).next();
    g.V(4188).property("NodeUpdatedDate"new Date()).next();
    g.tx().commit();
} catch (Exception e) {
//Recover, retry
}
  
With this 1st node V(4104) is locked by thread when update is happening, but it get released when update for other nodes V(4288), V(4188) happening, which mean other thread can update V(4104) before transaction is committed, which might result in data inconsistency.

I was thinking in some way acquire lock on all nodes before doing any operation on them some thing like :
g.V(4288).lock(),g.V(4104).lock(), g.V(4188).lock()
After locking explicitly, perform operations and unlock as part of commit.

Thanks,
Anjani

On Saturday, 31 October 2020 at 17:01:05 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

Do you mean that there are still (extremely rare) failure situations possible despite the use of locking and the use of JanusGraph transactions? I am not sure if I can think of one and it would depend on ill-timed failures in the backend (e.g. power failure). One thing to worry about and that you could properly test, is whether all mutations in the JanusGraph transaction are sent to the backend in a single network request (otherwise JanusGraph could have persisted two of the five nodes and then fail). There are various configuration properties that might influence this:

query.batch
storage.cql.atomic-batch-mutate
storage.cql.batch-statement-size

Also see the comments for the tx.log-tx property.

HTH,    Marc

Op vrijdag 30 oktober 2020 om 15:58:39 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


"anj...@gmail.com" <anjani...@...>
 

Hi Marc,

Thanks for your detailed response. My understanding is node is locked automatically during operation and get released after it, does not wait for commit.

 Suppose i need to update 3 nodes. I can write like as below. In this way if there is any exception for any of the node, will not commit and hence can control it. 
try {
    g.V(4104).property("NodeUpdatedDate", new Date()).next();
    g.V(4288).property("NodeUpdatedDate", new Date()).next();
    g.V(4188).property("NodeUpdatedDate"new Date()).next();
    g.tx().commit();
} catch (Exception e) {
//Recover, retry
}
  
With this 1st node V(4104) is locked by thread when update is happening, but it get released when update for other nodes V(4288), V(4188) happening, which mean other thread can update V(4104) before transaction is committed, which might result in data inconsistency.

I was thinking in some way acquire lock on all nodes before doing any operation on them some thing like :
g.V(4288).lock(),g.V(4104).lock(), g.V(4188).lock()
After locking explicitly, perform operations and unlock as part of commit.

Thanks,
Anjani

On Saturday, 31 October 2020 at 17:01:05 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

Do you mean that there are still (extremely rare) failure situations possible despite the use of locking and the use of JanusGraph transactions? I am not sure if I can think of one and it would depend on ill-timed failures in the backend (e.g. power failure). One thing to worry about and that you could properly test, is whether all mutations in the JanusGraph transaction are sent to the backend in a single network request (otherwise JanusGraph could have persisted two of the five nodes and then fail). There are various configuration properties that might influence this:

query.batch
storage.cql.atomic-batch-mutate
storage.cql.batch-statement-size

Also see the comments for the tx.log-tx property.

HTH,    Marc

Op vrijdag 30 oktober 2020 om 15:58:39 UTC+1 schreef anj...@...:
Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


HadoopMarc <bi...@...>
 

Hi Anjani,

Do you mean that there are still (extremely rare) failure situations possible despite the use of locking and the use of JanusGraph transactions? I am not sure if I can think of one and it would depend on ill-timed failures in the backend (e.g. power failure). One thing to worry about and that you could properly test, is whether all mutations in the JanusGraph transaction are sent to the backend in a single network request (otherwise JanusGraph could have persisted two of the five nodes and then fail). There are various configuration properties that might influence this:

query.batch
storage.cql.atomic-batch-mutate
storage.cql.batch-statement-size

Also see the comments for the tx.log-tx property.

HTH,    Marc

Op vrijdag 30 oktober 2020 om 15:58:39 UTC+1 schreef anj...@...:

Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


"anj...@gmail.com" <anjani...@...>
 

Hi Marc,

Thanks for your response. Earlier i had look on the page you shared and from that my understanding is we can define consistency at property level and if same property is modified by two different threads then  consistency check from back-end happens and transaction can success or can throw locking exception. But this is applicable to a property of a singe node.

In my case i want to add/update property on  multiple nodes based on some condition.  For example based on some rules we see some nodes are related and we want to group them, for that want to add/update one property on multiple nodes, say want to add/update property on 5 nodes. In that case want to local all 5 nodes, update them and then release locks. 
- If update to any of the node fails then we should roll back updates to other nodes also.
- When update to 5 nodes are going on, no other threads should modify that property.

Thanks,
Anjani

 

On Friday, 30 October 2020 at 19:26:10 UTC+5:30 HadoopMarc wrote:
Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:


What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:
Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


HadoopMarc <bi...@...>
 

Hi Anjani,

I am not sure if I understand your question and if your question already took the following into account:

https://docs.janusgraph.org/advanced-topics/eventual-consistency/#data-consistency

What aspect of transactions do you miss? You can choose between tx.commit() for succesful insertion and tx.rollback() in case of exceptions.

Please clarify!

Marc

Op vrijdag 30 oktober 2020 om 08:15:36 UTC+1 schreef anj...@...:

Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani


"anj...@gmail.com" <anjani...@...>
 

Hi All,

We are using Janus 0.5.2 with Cassandra and Elastic-search. 
Currently for adding or updating a node we are using gremlin queries in java.  

We have a use case where we need to update multiple-nodes for a given metadata. We want to make sure updates to multiple nodes are transactional and when updates are happening, no other thread should update them.

Through gremlin queries do we have option to: 
 - achieve transaction updates.
 - locking/unlocking of nodes for updates?

Appreciate your thoughts/inputs.

Thanks,
Anjani