Topics

How to circumvent transaction cache?


timon.schneider@...
 

Our application has transactions editing many vertices representing elements of a branch. This branch is also represented by a vertex that has boolean property isPublished. Before committing such a transaction, we need to know whether another user set the isPublished property on the branch vertex to true, in which case the transaction should be rolled back.

Here’s the problem:
* User A reads the branch vertex but doesn’t close transaction
* User B changes the isPublished property to true and commits (while A is still making changes)
* User A read locks the vertex with an external locking API
* User A queries the branch vertex again (to make sure isPublished is still false) in the same thread but gets the old values because of the transaction cache.
Now user A can commit data even though the branch isPublished is true.

I know it’s possible to use createThreadedTx() to circumvent the ThreadLocal transaction cache. However, such refreshes will be very common in our application and ideally we would be able to execute a refresh within the main transaction to minimise complexity and workarounds. Is this possible? And if not, are there any possibilities to turn off transaction cache entirely?

Thanks in advance,
Timon


Boxuan Li
 

Hi Timon,

I don’t even think you will be able to disable tx-cache by using createThreadedTx(), or equivalently, newTransaction()/buildTransaction(). Unfortunately, as long as your transaction is not readOnly(), the effective vertex transaction size will be Math.max(100, cache.tx-cache-size).

To my best knowledge, you can only modify JanusGraph source code to completely disable transaction level cache. A workaround would be to always start a new transaction to check whether the value has changed.

Best regards,
Boxuan

「<timon.schneider@...>」在 2021年3月3日 週三,下午9:11 寫道:

Our application has transactions editing many vertices representing elements of a branch. This branch is also represented by a vertex that has boolean property isPublished. Before committing such a transaction, we need to know whether another user set the isPublished property on the branch vertex to true, in which case the transaction should be rolled back.

Here’s the problem:
* User A reads the branch vertex but doesn’t close transaction
* User B changes the isPublished property to true and commits (while A is still making changes)
* User A read locks the vertex with an external locking API
* User A queries the branch vertex again (to make sure isPublished is still false) in the same thread but gets the old values because of the transaction cache.
Now user A can commit data even though the branch isPublished is true.

I know it’s possible to use createThreadedTx() to circumvent the ThreadLocal transaction cache. However, such refreshes will be very common in our application and ideally we would be able to execute a refresh within the main transaction to minimise complexity and workarounds. Is this possible? And if not, are there any possibilities to turn off transaction cache entirely?

Thanks in advance,
Timon


timon.schneider@...
 
Edited

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A reads vertex X and then starts transaction and makes edits, does not commit yet
B may or may not edit X
A continues editing and before committing it needs to makes sure vertex X was not changed by B or else rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon


Nicolas Trangosi
 

Hi Simon,
It seems that you can force JG to re-read elements just before commit according to

I have never try the option mgmt.setConsistency but this may help you.

Regards,
Nicolas

Le ven. 5 mars 2021 à 10:20, <timon.schneider@...> a écrit :

[Edited Message Follows]

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A starts transaction and makes edits, does not commit yet
B makes an edit to vertex X and commits
A cannot see B's edit to vertex X unless A commits or rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon



--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   




Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.


timon.schneider@...
 

Thanks for your suggestion, but the consistency setting does not solve my problem.


Ted Wilmes
 

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.


Boxuan Li
 

Hi Timon,

As I mentioned earlier, the only way I can think of (assuming you are not concerned about the consistency of data storage as Ted mentioned) is to modify JanusGraph source code:

In CacheVertex class, there is a data structure, protected final Map<SliceQuery, EntryList> queryCache.

What you could do is to add a method to that class:

public void refresh() {
    queryCache.clear();
}

And then you can call refresh before you want to load new value from the storage rather than cache:

((CacheVertex) v1).refresh();

Hope this helps,
Boxuan


On Mar 6, 2021, at 12:32 AM, Ted Wilmes <twilmes@...> wrote:

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.




hadoopmarc@...
 

Hi Timon,

Adding to the answer of Ted, I can imagine that your new data enter your pipeline from a Kafka queue. With a microbatching solution, e.g. Apache Spark streaming, you could pre-shuffle your data per microbatch to be sure that all data relating to a branch are in a single partition. After that, a single thread can handle this single partition in one JanusGraph transaction. This approach seems fit better to your use case that trying to circumvent ACID limits in a tricky way.

Best wishes,    Marc


timon.schneider@...
 

Hi all,

On Fri, Mar 5, 2021 at 05:32 PM, Ted Wilmes wrote:
Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.
I actually aim to keep the system ACID compliant. The only thing is (which I struggle to implement in JG) is that the edits only can be committed after a lock and read is done on the branch vertex' isPublished property. The problem is JG doesn't offer select for update functionality. I need to read the branch vertex to get the Id and lock it but while I'm getting it the isPublished property can be set to true by another user. Getting the vertex, locking it, and refreshing the data could be an option however it's not supported by JG.

Isn't this a shortcoming of JG that many users have issues with?

I think the single thread solution you suggest is not an option as our application is a meta data editor where multiple users should be able to edit elements of a branch simultaneously.

@Bo Xuan Li
I'm very much concerned with the consistency of the data. The check on the branch vertex is just a read operation necessary to guarantee that the branch is not published at the point of persisting the edits.


Boxuan Li
 

Hi Timon, what exactly is your data storage setup? For example, if you are using Cassandra (with replication) then there is no guarantee your current transaction can read latest value right after another transaction commits (even if your transaction does not cache anything).

JanusGraph is a layer built on top of your storage backend. Unfortunately, it cannot provide any guarantee that is not provided by the underlying storage backend at the first place. If you are concerned about ACID maybe you should use BerkeleyDB or FoundationDB.
 

On Mar 8, 2021, at 7:44 PM, timon.schneider@... wrote:

Hi all,

On Fri, Mar 5, 2021 at 05:32 PM, Ted Wilmes wrote:
Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.
I actually aim to keep the system ACID compliant. The only thing is (which I struggle to implement in JG) is that the edits only can be committed after a lock and read is done on the branch vertex' isPublished property. The problem is JG doesn't offer select for update functionality. I need to read the branch vertex to get the Id and lock it but while I'm getting it the isPublished property can be set to true by another user. Getting the vertex, locking it, and refreshing the data could be an option however it's not supported by JG.

Isn't this a shortcoming of JG that many users have issues with?

I think the single thread solution you suggest is not an option as our application is a meta data editor where multiple users should be able to edit elements of a branch simultaneously.

@Bo Xuan Li
I'm very much concerned with the consistency of the data. The check on the branch vertex is just a read operation necessary to guarantee that the branch is not published at the point of persisting the edits.


timon.schneider@...
 

Currently using HBase.

Consider the following:
User A decides to set isPublished of Vertex X from false to true, does not commit yet.
User B changes isPublished of Vertex X from false to true and commits immediately.
User A commits and will get an error because the property value is not the same anymore as at the start of the transaction.

Why wouldn't it be possible for JG to provide user A with select for update functionality that allows user A to select vertex X for update, do edits to other elements, commit and get the same message as in the example above if the property on vertex X is changed?


Boxuan Li
 

My thoughts are:

1) As you said, you wanted to be able to disable the transaction cache so that you can read from database again. I have provided a solution here: https://lists.lfaidata.foundation/g/janusgraph-users/message/5668 A PR is available here: https://github.com/JanusGraph/janusgraph/pull/2502
2) An alternative approach (apart from your external locking approach) is to use JanusGraph built-in locking mechanism, as we have discussed here: https://groups.google.com/g/janusgraph-users/c/WzsO78ndobA/m/e6GzFXI5CQAJ

Although the above approaches will likely work most of the time, they are not guaranteed to be robust due to the eventual consistency nature of HBase. If you need ACID you should switch to FoundationDB backend. To my best knowledge, there is no way that JanusGraph can provide ACID on top of an eventual consistent storage backend, because the graph instances can only “communicate" with each other via the underlying storage backend.

On Mar 8, 2021, at 9:40 PM, timon.schneider@... wrote:

Currently using HBase.

Consider the following:
User A decides to set isPublished of Vertex X from false to true, does not commit yet.
User B changes isPublished of Vertex X from false to true and commits immediately.
User A commits and will get an error because the property value is not the same anymore as at the start of the transaction.

Why wouldn't it be possible for JG to provide user A with select for update functionality that allows user A to select vertex X for update, do edits to other elements, commit and get the same message as in the example above if the property on vertex X is changed?


timon.schneider@...
 

Thanks for your thoughts.
1) I'm very interested to try out the PR you made for this issue.
2) I don't think the solution you gave me in that previous thread solves the issue. What if another user sets version_v.published to true between step 3 and 4. This is allowed even with the ConsistencyModifier.LOCK on the vertex and properties of version_v.

1. start_transaction();
2. read_vertex(type_v);
3. read_vertex(version_v); // type_v ——hasVersion—> version_v
4. if (version_v.published == true) then abort();
5. update_vertex(type_v);
6. update_vertex(version_v); // set version_v.published = true
7. commit();