Sql-server – Sql Server – Select while insert on another transaction gives unexpected results

indexsql serversql-server-2008-r2transaction

I stumbled at situation that changes my knowledge about transactions and locking fundamentally (I don't know much though), and I need help to understand it.

Let's say I have a table like this:

CREATE TABLE [dbo].[SomeTable](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[SomeData] [varchar](200) NOT NULL,
[Moment] [datetime] NOT NULL,
[SomeInt] [bigint] NOT NULL
) ON [PRIMARY]

and I run this "insert 1000 rows within a transaction" query:

BEGIN TRAN t1

DECLARE @i INT = 0

WHILE @i < 1000
BEGIN
    SET @i = @i + 1

    INSERT INTO [SomeTable] ([SomeData] ,Moment, SomeInt)
    VALUES (CONVERT(VARCHAR(255), NEWID()), getdate(), @i)

    WAITFOR DELAY '00:00:00:010'
END

COMMIT TRAN t1

While this transaction is running, I'm executing a simple select:

SELECT Id, Moment, SomeData, SomeInt FROM [SomeTable]

It isn't always possible to reproduce it (apparently depends on timings) but sometimes select query will, after insert transaction finishes, return less than 1000 rows. In my ignorance I have believed that select will always return 1000 rows (given that isolation level is Read Committed), but obviously I have misunderstood how transactions and locking work.

However, if I put a primary key on Id column (which generates a clustered index), select query will, as long as I've tried, return all 1000 rows. Putting indexes some other way, with clustered index on composite key and a non-clustered index on some other columns, may again result in returning less number of rows than I've expected.

So, I have these questions:

  1. Why select doesn't always return all rows committed by transaction?
  2. If this is expected behavior what's the best way of actually making it work as I have expected? Basically, I want select to return the state of the table after (or before) the transaction, not some half-done data. Snapshot isolation is currently not an option. Putting TABLOCK seems to be doing the work, but is there a better solution? In real life I have tables that I wouldn't want to lock on this level if it is not absolutely necessary.
  3. Why putting an index changes this behavior?

Thanks in advance.

Best Answer

I haven't managed to reproduce this after running your code a few times.

I presume that it must happen when a later row gets inserted onto an earlier page in the file though.

So the order of operations is (for example)

  • Rows inserted into heap on pages 200, 207, 223
  • Select statement starts and performs an allocation ordered scan. Finds that the first page is 200 and is blocked waiting on a row lock to be released.
  • Other rows are inserted by the first transaction. Some of them are allocated on a page before 200. Insert transaction commits.
  • Row lock released and continues allocation ordered scan. Rows earlier in the file are missed.

The table comprised 10 pages. By default the first 8 pages will be allocated from mixed extents and then it will be allocated a uniform extent. Maybe in your case space was available in the file for a free uniform extent prior to the mixed extents that were used.

You can test this theory by running the following in a different window after you have reproduced the issue and seeing if the missing rows from the original SELECT all appear at the beginning of this resultset.

SELECT [SomeData],
       Moment,
       SomeInt,
       file_id,
       page_id,
       slot_id
FROM   [SomeTable] 
/*Undocumented - Use at own risk*/
CROSS APPLY sys.fn_PhysLocCracker(%% physloc %%)
ORDER BY page_id, SomeInt

The operation against an indexed table will be in index key order rather than allocation order so will not be affected by this particular scenario.

An allocation ordered scan can be carried out against an index but it is only considered if the table is sufficiently large and the isolation level is read uncommitted or a table lock is held.

Because read committed generally releases locks as soon as the data is read it is possible at for a scan against the index to read rows twice or not at all (if the index key is updated by a concurrent transaction causing the row to move forward or back) See The Read Committed Isolation Level for more discussion about this type of issue.


By the way I was originally envisaging for the indexed case that the index was on one of the columns that increases relative to insert order (any of Id, Moment, SomeInt). However even if the clustered index is on the random SomeData the issue still doesn't arise.

I tried

DBCC TRACEON(3604, 1200, -1) /*Caution. Global trace flag. Outputs lock info
                               on every connection*/

SELECT TOP 2 *,
             %%LOCKRES%%
FROM   [SomeTable] WITH(nolock)
ORDER BY [SomeData];

SELECT *,
       %%LOCKRES%%
FROM   [SomeTable]
ORDER BY [SomeData];

/*Turn off trace flags. Doesn't check whether or not they were on already 
  before we started, with TRACEOFF*/
DBCC TRACEOFF(3604, 1200, -1)

Results were as below

enter image description here

The second resultset includes all 1,000 rows. The locking info shows that even though it was blocked waiting on lock resource 24c910701749 when the lock was released it doesn't just continue the scan from that point. Instead it immediately releases that lock and acquires a row lock on the new first row.

enter image description here