The basic technique is quite straightforward. When you read the record you take a note of the version or timestamp column, e.g.
Select FooID
,Foo
,Bar
,TS -- timestamp
from Foobar
where FooID = @FooID
When you go to write out the record you filter the write by the timestamp/version so that the write writes nothing if the timestamp/version has changed. This makes the write atomic, e.g.
update Foo
set Foo = @foo
,Bar = @bar
,TS = @timestamp
where FooID = @FooID
and ts = @timestamp
select @row_count = @@rowcount -- specific to t-sql, but this is a system variable
-- that holds the number of rows affected by the
if @@rowcount = 0 -- most recent operation. Other DBMS platforms do
[deal with outdated record] -- this differently.
This allows an application to do the update without holding locks open. This is necessary
for n-tier systems working through a connection pool, and prevents a class of deadlocks
that used to be common on two-tier client-server systems.
There is nothing enforced in the database about this. It's all done explicitly by the application.
Duplicate question from:
https://stackoverflow.com/questions/129329/optimistic-vs-pessimistic-locking
Copy/Pasting answer from the above link:
Optimistic Locking is a strategy where you read a record, take note of a version number and check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic. (i.e. hasn't been updated between when you check the version and write the record to the disk) and update the version in one hit.
If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.
This strategy is most applicable to high-volume systems and three-tier architectures where you do not necessarily maintain a connection to the database for your session. In this situation the client cannot actually maintain database locks as the connections are taken from a pool and you may not be using the same connection from one access to the next.
Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks. To use pessimistic locking you need either a direct connection to the database (as would typically be the case in a two tier client server application) or an externally available transaction ID that can be used independently of the connection.
In the latter case you open the transaction with the TxID and then reconnect using that ID. The DBMS maintains the locks and allows you to pick the session back up through the TxID. This is how distributed transactions using two-phase commit protocols (such as XA or COM+ Transactions) work.
Edit (Adding more info to address the performance question):
Performance wise it depends on your environment. Take in the following factors to decide:
you're going to find optimistic will be better due to concurrency in most situations. Depending on the RDBMS and environment this might be less or more performant however. Typically with Optimistic locking you will find that the value needs to be row versioned somewhere.
With MS SQL Server for example, it gets moved to TempDB and something between 12-14 bytes are appended at the end of the column. Turning on optimistic locking with an isolation level such as Snapshot Isolation can cause fragmentation and your fill factor will need to be adjusted as the rows now have additional data at the end which could cause a page near full to cause a page split, which will lower your performance. If your TempDB is under optimized then this will not be as fast.
So I guess a checklist is:
- -Do you have sufficient IO/resources to handle the form of row versioning? If not, you are adding overhead. If so, then if you are
reading the data often while you are often locking it for writes, you
will notice a good improvement on concurrency across reads and writes
(although writes will still block writes, reads will no longer block
writes and vice versa)
- -Is your code susceptible to deadlocks or do you experience locking? If you are not experiencing long locks or a lot of deadlocks, then
the additional overhead of Optimistic locking wouldn't make things
faster, of course, in most cases we're talking milliseconds here.
- -If your DB is big (or on very limited hardware) and your data pages are near full, depending on the RDBMS, you could cause major page
splits and data fragmentation so make sure to consider reindexing
after turning it on.
Those are my thoughts on the matter, open to hearing more from the community.
Best Answer
Your developer is mistaken. You need either
SELECT ... FOR UPDATE
or row versioning, not both.Try it and see. Open three MySQL sessions
(A)
,(B)
and(C)
to the same database.In
(C)
issue:In both
(A)
and(B)
issue anUPDATE
that tests and sets the row version, changing thewinner
text in each so you can see which session is which:Now in
(C)
,UNLOCK TABLES;
to release the lock.(A)
and(B)
will race for the row lock. One of them will win and get the lock. The other will block on the lock. The winner who got the lock will proceed to change the row. Assuming(A)
is the winner, you can now see the changed row (still uncommitted so not visible to other transactions) with aSELECT * FROM test WHERE id = 1
.Now
COMMIT
in the winner session, say(A)
.(B)
will get the lock and proceed with the update. However, the version no longer matches, so it will change no rows, as reported by the row count result. Only oneUPDATE
had any effect, and the client application can clearly see whichUPDATE
succeeded and which failed. No further locking is necessary.See session logs at pastebin here. I used
mysql --prompt="A> "
etc to make it easy to tell the difference between sessions. I copied and pasted the output interleaved in time sequence, so it's not totally raw output and it's possible I could've made errors copying and pasting it. Test it yourself to see.If you had not added a row version field, then you would need to
SELECT ... FOR UPDATE
to be able to reliably ensure ordering.If you think about it, a
SELECT ... FOR UPDATE
is completely redundant if you're immediately doing anUPDATE
without re-using data from theSELECT
, or if you're using row versioning. TheUPDATE
will take a lock anyway. If someone else updates the row between your read and subsequent write, your version won't match anymore so your update will fail. That's how optimistic locking works.The purpose of
SELECT ... FOR UPDATE
is:SERIALIZABLE
isolation or row versioning.You do not need to use both optimistic locking (row versioning) and
SELECT ... FOR UPDATE
. Use one or the other.