Mysql – How to correctly implement optimistic locking in MySQL

lockingMySQL

How does one correctly implement optimistic locking in MySQL?

Our team has deduced that we must do #4 below or else there is a risk that another thread can update the same version of the record, but we'd like to validate that this is the best way to do it.

Create a version field on the table you want to use optimistic locking for e.g. column name = "version"
On selects, make sure to include the version column and make note of the version
On a subsequent update to the record, the update statement should issue "where version = X" where X is the version we received in #2 and set the version field during that update statement to X + 1
Perform a SELECT FOR UPDATE on the record we are going to update so that we serialize who can make changes to the record we are trying to update.

To clarify, we are trying to prevent two threads which select the same record in the same time window where they grab the same version of the record from overwriting each others if they were to try and update the record at the same time. We believe that unless we do #4, there is a chance, that if both threads enter their respective transactions at the same time (but have not issued their updates yet), when they go to update, the second thread that will use the UPDATE … where version = X will be operating on old data.

Are we correct in thinking we must do this pessimistic locking when updating even though we are using version fields/optimistic locking?

Best Answer

Your developer is mistaken. You need either SELECT ... FOR UPDATE or row versioning, not both.

Try it and see. Open three MySQL sessions (A), (B) and (C) to the same database.

In (C) issue:

CREATE TABLE test(
    id integer PRIMARY KEY,
    data varchar(255) not null,
    version integer not null
);
INSERT INTO test(id,data,version) VALUES (1,'fred',0);
BEGIN;
LOCK TABLES test WRITE;

In both (A) and (B) issue an UPDATE that tests and sets the row version, changing the winner text in each so you can see which session is which:

-- In (A):

BEGIN;
UPDATE test SET data = 'winnerA',
            version = version + 1
WHERE id = 1 AND version = 0;

-- in (B):

BEGIN;
UPDATE test SET data = 'winnerB',
            version = version + 1
WHERE id = 1 AND version = 0;

Now in (C), UNLOCK TABLES; to release the lock.

(A) and (B) will race for the row lock. One of them will win and get the lock. The other will block on the lock. The winner who got the lock will proceed to change the row. Assuming (A) is the winner, you can now see the changed row (still uncommitted so not visible to other transactions) with a SELECT * FROM test WHERE id = 1.

Now COMMIT in the winner session, say (A).

(B) will get the lock and proceed with the update. However, the version no longer matches, so it will change no rows, as reported by the row count result. Only one UPDATE had any effect, and the client application can clearly see which UPDATE succeeded and which failed. No further locking is necessary.

See session logs at pastebin here. I used mysql --prompt="A> " etc to make it easy to tell the difference between sessions. I copied and pasted the output interleaved in time sequence, so it's not totally raw output and it's possible I could've made errors copying and pasting it. Test it yourself to see.

If you had not added a row version field, then you would need to SELECT ... FOR UPDATE to be able to reliably ensure ordering.

If you think about it, a SELECT ... FOR UPDATE is completely redundant if you're immediately doing an UPDATE without re-using data from the SELECT, or if you're using row versioning. The UPDATE will take a lock anyway. If someone else updates the row between your read and subsequent write, your version won't match anymore so your update will fail. That's how optimistic locking works.

The purpose of SELECT ... FOR UPDATE is:

To manage lock ordering to avoid deadlocks; and
To extend the span of a row lock for when you want to read data from a row, change it in the application, and write a new row that's based on the original one without having to use SERIALIZABLE isolation or row versioning.

You do not need to use both optimistic locking (row versioning) and SELECT ... FOR UPDATE. Use one or the other.

Related Solutions

How does optimistic locking actually enforce re-read/update

The basic technique is quite straightforward. When you read the record you take a note of the version or timestamp column, e.g.

Select FooID
      ,Foo
      ,Bar
      ,TS      -- timestamp
  from Foobar
 where FooID = @FooID

When you go to write out the record you filter the write by the timestamp/version so that the write writes nothing if the timestamp/version has changed. This makes the write atomic, e.g.

update Foo
   set Foo = @foo
      ,Bar = @bar
      ,TS = @timestamp  
 where FooID = @FooID
   and ts = @timestamp

select @row_count = @@rowcount  -- specific to t-sql, but this is a system variable
                                -- that holds the number of rows affected by the
if @@rowcount = 0               -- most recent operation.  Other DBMS platforms do
    [deal with outdated record] -- this differently.

This allows an application to do the update without holding locks open. This is necessary for n-tier systems working through a connection pool, and prevents a class of deadlocks that used to be common on two-tier client-server systems.

There is nothing enforced in the database about this. It's all done explicitly by the application.

Why is optimistic locking faster than pessimistic locking

Duplicate question from:

https://stackoverflow.com/questions/129329/optimistic-vs-pessimistic-locking

Copy/Pasting answer from the above link:

Optimistic Locking is a strategy where you read a record, take note of a version number and check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic. (i.e. hasn't been updated between when you check the version and write the record to the disk) and update the version in one hit.

If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.

This strategy is most applicable to high-volume systems and three-tier architectures where you do not necessarily maintain a connection to the database for your session. In this situation the client cannot actually maintain database locks as the connections are taken from a pool and you may not be using the same connection from one access to the next.

Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks. To use pessimistic locking you need either a direct connection to the database (as would typically be the case in a two tier client server application) or an externally available transaction ID that can be used independently of the connection.

In the latter case you open the transaction with the TxID and then reconnect using that ID. The DBMS maintains the locks and allows you to pick the session back up through the TxID. This is how distributed transactions using two-phase commit protocols (such as XA or COM+ Transactions) work.

Edit (Adding more info to address the performance question):

Performance wise it depends on your environment. Take in the following factors to decide:

you're going to find optimistic will be better due to concurrency in most situations. Depending on the RDBMS and environment this might be less or more performant however. Typically with Optimistic locking you will find that the value needs to be row versioned somewhere.

With MS SQL Server for example, it gets moved to TempDB and something between 12-14 bytes are appended at the end of the column. Turning on optimistic locking with an isolation level such as Snapshot Isolation can cause fragmentation and your fill factor will need to be adjusted as the rows now have additional data at the end which could cause a page near full to cause a page split, which will lower your performance. If your TempDB is under optimized then this will not be as fast.

So I guess a checklist is:

-Do you have sufficient IO/resources to handle the form of row versioning? If not, you are adding overhead. If so, then if you are reading the data often while you are often locking it for writes, you will notice a good improvement on concurrency across reads and writes (although writes will still block writes, reads will no longer block writes and vice versa)
-Is your code susceptible to deadlocks or do you experience locking? If you are not experiencing long locks or a lot of deadlocks, then the additional overhead of Optimistic locking wouldn't make things faster, of course, in most cases we're talking milliseconds here.
-If your DB is big (or on very limited hardware) and your data pages are near full, depending on the RDBMS, you could cause major page splits and data fragmentation so make sure to consider reindexing after turning it on.

Those are my thoughts on the matter, open to hearing more from the community.

Best Answer

Related Solutions

How does optimistic locking actually enforce re-read/update

Why is optimistic locking faster than pessimistic locking

Related Question