Why does the SERIALIZABLE transaction isolation level not guarantee point-in-time views of data

isolation-levelsnapshot-isolationsql servert-sqltransaction

I have been told that if your transaction consists of multiple statements, then the SNAPSHOT transaction isolation level is the only way to guarantee consistent point-in-time views of data. Why is the SERIALIZABLE transaction isolation level inadequate for this task? By design, the locks that SERIALIZABLE holds are very tight.

I think the gap in my understanding is that I am unsure when SERIALIZABLE takes its very tight locks. A script like the below will likely be very helpful in finding what I am missing.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;

SELECT TOP (1) * FROM [Hit_Me_First];

WAITFOR DELAY '00:02:00';  

SELECT TOP (1) * FROM [Hit_Me_Second];

COMMIT;

Best Answer

People always seem to want to understand isolation levels in terms of locks.

This is somewhat strange because isolation levels are generally not defined by locking, rather by concurrency phenomena* one might experience at each level.

Some products happen to implement some isolation levels using locking, but this isn't true for all levels or products. For example, SQL Server has both lock-based and versioning-based implementations for the read committed isolation level.

Even serializable isolation is implemented in SQL Server without locking when you use the in-memory OLTP (Hekaton) storage engine. PostgreSQL also implements serializable using a versioning scheme by the way.

I have been told that if your transaction consists of multiple statements, then the SNAPSHOT transaction isolation level is the only way to guarantee consistent point-in-time views of data.

Snapshot isolation does guarantee you see only committed data as it was at the start of the transaction. The start of the transaction is the first time you access versioned data within the transaction, not when you issue BEGIN TRAN.

Serializable also provides a consistent view of the data as it was at a point in time, but that point in time is generally not the start of the transaction. To understand this, consider the SQL Standard definition (emphasis added):

A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

If the effects are the same as if all transactions ran serially, in some order, clearly you will see only committed data as it was when your transaction ran serially (i.e. with no concurrent activity). The crucial difference is you cannot know exactly what that schedule of serial executions was.

If you really must think about this through the locking implementation, consider that SQL Server takes sufficient locks to ensure all data needed by your transaction does not change after it is read. Further, no new data that would be seen by your transaction if it were re-run can be added until your transaction ends (no 'phantoms').

Through that lens, the point in time view is from when the last lock needed by your transaction was acquired to the end of your transaction. For convenience, you could choose this to mean the time the transaction ended.

Again, I emphasise that this is a consequence of an implementation decision. It could change, in principle. I would encourage you to understand concurrency through the logical guarantees provided and concurrency effects avoided.

For more, see my article The Serializable Isolation Level and the rest of the series for other isolation levels, including snapshot isolation.

_{* Any effect due to concurrency that would not occur if the transaction ran alone}

Related Solutions

Best Isolation Level for Non-Blocking Select Statements in SQL Server

In an ideal world you would have two choices, SNAPSHOT and READ COMMITTED SNAPSHOT (RCSI). Make sure you understand the basics of transaction isolation levels before you decide which is appropriate for your workload. Specifically be aware of the different results you may see as a result of moving to RCSI.

This sounds like it's not an ideal world as you don't have any control over the application that is generating the select statements. In that case, your only option is to enable RCSI for the database in question such that the selects will automatically use RCSI instead of READ COMMITTED.

Postgresql – Read Committed Isolation Level

The difference lies between a query and a transaction. A transaction can contain any number of queries. To illustrate the difference, I set up a small example:

CREATE TABLE table_to_be_updated (
    id serial PRIMARY KEY,
    other_column text,
    column_changing text
);

INSERT INTO table_to_be_updated (other_column, column_changing)
VALUES
('value', 'old_value'),
('value', 'other_value'),
('nonvalue', 'doesnt matter');

Then run two transactions concurrently (issuing the commands one by one, the middle line wants to depict the timeline):

                                | <-- BEGIN;
                                |
                                |
                                |     UPDATE table_to_be_updated
BEGIN; -----------------------> |     SET column_changing = 'new_value'
                                |     WHERE
                                |         other_column = 'value' AND
                                |         column_changing = 'old_value';
                                |
                                |
SELECT column_changing -------> |     -- update not yet committed
FROM table_to_be_updated        |
WHERE other_column = 'value';   | <-- COMMIT;
                                |
                                |
SELECT column_changing -------> |
FROM table_to_be_updated        |
WHERE other_column = 'value';   |
                                |
                                |
COMMIT; ----------------------> |

Running these in READ COMMITTED isolation level, the first query returns a row with 'old_value', while the second one shows a row with 'new_value'. On an other run, I change the left-hand-side transaction isolation level:

SET transaction ISOLATION LEVEL REPEATABLE READ;

(The command must be the first statement in a transaction.) Now both SELECTs return the same rowset, while a third one after committing both transactions will show the new row.

Best Answer

Related Solutions

Best Isolation Level for Non-Blocking Select Statements in SQL Server

Postgresql – Read Committed Isolation Level

Related Question