Database Design – Which Isolation Levels Can Cause Serialization Failure?

database-design

The quote below from Wikipedia (here) says that READ COMMITTED doesn't cause serialization failure! Can you please explain the reason? Which isolation levels cause serialization failure?

Under multiversion concurrency control, at the SERIALIZABLE isolation
level, both SELECT queries see a snapshot of the database taken at the
start of Transaction 1. Therefore, they return the same data. However,
if Transaction 2 then attempted to UPDATE that row as well, a
serialization failure would occur and Transaction 1 would be forced to
roll back.

At the READ COMMITTED isolation level, each query sees a snapshot of
the database taken at the start of each query. Therefore, they each
see different data for the updated row. No serialization failure is
possible in this mode (because no promise of serializability is made),
and Transaction 1 will not have to be retried.

Best Answer

It's impossible to give a brief answer to this.

I suggest that you start reading here:

If you can read and grasp all of that material (and the links within - the book is the cherry on the cake but not strictly required just to answer this question), you'll be well on the way to understanding transactions and isolation levels!

Edit:

In response to the OP's comments - by serialization failure, what I believe the author of the wiki (which is here - please quote references in future if possible) means is that in the first case, there will be a failure of T1 due to the isolation level being set to SERIALIZABLE - whereas, in the second case, a level of READ COMMITTED will mean that the T1 will not fail.

The important think to realise about SERIALIZABLE transactions is that (from here):

a transaction schedule is serializable if its outcome (e.g., the resulting database state) is equal to the outcome of its transactions executed serially, i.e. without overlapping in time.

So, T1 will fail because its entire view of the data can't be consistent over the entire time (i.e. for all of the queries) of the transaction due to the modification by T2.

I have found (and this confused me for a while until I teased it through) is that there is a BIG DIFFERENCE between a query AND a transaction - a query may or may not be a proper subset of a transaction. On most modern RDBMS's, if you don't explicitly specify a BEGIN TRANSACTION or an isolation level, the default is that:

  • a) a transaction is started for you automatically beneath the hood and,

  • b) that transaction is read-committed - which normally doesn't matter much since it's one query.

The distinction really becomes important when you start running multi-query transactions - that's when the wheat is sorted from the chaff and the developer has to understand what's going on in the server re. transaction isolations levels

There is, as with all things in IT and databases in particular there are balances, trade-offs and compromises. Why, you might ask, is SERIALIZABLE not the default and/or not always specified? The answer is simple: Performance - there's your trade-off!

I would urge you now to read the references I've provided - do some experiments with two CLI windows on your server of choice (or multiple servers if you really want to get into it!) running some of the code referenced!