Database Design – Which Isolation Levels Can Cause Serialization Failure?

database-design

The quote below from Wikipedia (here) says that READ COMMITTED doesn't cause serialization failure! Can you please explain the reason? Which isolation levels cause serialization failure?

Under multiversion concurrency control, at the SERIALIZABLE isolation
level, both SELECT queries see a snapshot of the database taken at the
start of Transaction 1. Therefore, they return the same data. However,
if Transaction 2 then attempted to UPDATE that row as well, a
serialization failure would occur and Transaction 1 would be forced to
roll back.

At the READ COMMITTED isolation level, each query sees a snapshot of
the database taken at the start of each query. Therefore, they each
see different data for the updated row. No serialization failure is
possible in this mode (because no promise of serializability is made),
and Transaction 1 will not have to be retried.

Best Answer

It's impossible to give a brief answer to this.

I suggest that you start reading here:

Think twice before dropping ACID and throw your CAP away
follow the links, particularly to jepsen.io
What developers don't know about transactions will hurt them!
Martin Kleppmann's take on isolation levels (quoted in the 2nd link - and his explanation)
Recommended further reading: Kleppmann's book (which I've purchased) is highly regarded! Of 1525 reviews, 94% are 4 (8%) or 5 (86%) star rated - that's the best I've ever seen!

If you can read and grasp all of that material (and the links within - the book is the cherry on the cake but not strictly required just to answer this question), you'll be well on the way to understanding transactions and isolation levels!

Edit:

In response to the OP's comments - by serialization failure, what I believe the author of the wiki (which is here - please quote references in future if possible) means is that in the first case, there will be a failure of T1 due to the isolation level being set to SERIALIZABLE - whereas, in the second case, a level of READ COMMITTED will mean that the T1 will not fail.

The important think to realise about SERIALIZABLE transactions is that (from here):

a transaction schedule is serializable if its outcome (e.g., the resulting database state) is equal to the outcome of its transactions executed serially, i.e. without overlapping in time.

So, T1 will fail because its entire view of the data can't be consistent over the entire time (i.e. for all of the queries) of the transaction due to the modification by T2.

I have found (and this confused me for a while until I teased it through) is that there is a BIG DIFFERENCE between a query AND a transaction - a query may or may not be a proper subset of a transaction. On most modern RDBMS's, if you don't explicitly specify a BEGIN TRANSACTION or an isolation level, the default is that:

a) a transaction is started for you automatically beneath the hood and,
b) that transaction is read-committed - which normally doesn't matter much since it's one query.

The distinction really becomes important when you start running multi-query transactions - that's when the wheat is sorted from the chaff and the developer has to understand what's going on in the server re. transaction isolations levels

There is, as with all things in IT and databases in particular there are balances, trade-offs and compromises. Why, you might ask, is SERIALIZABLE not the default and/or not always specified? The answer is simple: Performance - there's your trade-off!

I would urge you now to read the references I've provided - do some experiments with two CLI windows on your server of choice (or multiple servers if you really want to get into it!) running some of the code referenced!

Related Solutions

Users database to store access levels, can I move the non-essential information to another table

Users.

id, student_number, password, first_name, last_name, address, email, mothers_name,  
mothers_occupation, fathers_name, fathers_occupation, religion, civil_status

Roles
id, name

UserRoleMembership
user_id, role_id

If the problem is that you actually do want to validate some of those fields - but only for some users, I would use inheritance:

users
id, username, password

student_user
id,
user_id,
last_name required,
first_name required,
etc.

admin_user
id,
user_id,
admin_info_fields...

Database Design – Understanding Fact Table Granularity

Your intuition of code smell is well honed.

What you are dealing with on reserves is what Kimball calls a "semi-additive fact". It does not roll up nicely to quarter or year.

The typical solution to this is to have two fact tables, one for the additive fact (payments in your case) and one for the non-additive fact. The non-additive fact does not actually need to have a grain at the month level, you could store them all the way to the day and things still just work.

The non-additive fact, reserve, is queried differently than the other fact. There is a business decision you need to make: What does reserve at the year level mean? Is it the last month of the year, or maybe the average of the months in the year? Whichever choice you make, you can find the solution to modeling this it in the Kimball books under the chapters on non-additive facts.

Please note that if you use a cube product like Analysis Services, it is possible to have the aggregates "just work" even if you store it all in one table. However, I prefer to keep things separate so relational queries are easier to write (and the facts are easier to load too).

Best Answer

Edit:

Related Solutions

Users database to store access levels, can I move the non-essential information to another table

Database Design – Understanding Fact Table Granularity

Related Question