Sql-server – Why are timestamps not always increasing with concurrent inserts

concurrencysql serversql-server-2008-r2timestamp

I'm seeing some unexpected behavior with timestamp (rowversion) columns . I created a test table:

create table Test
(
    Test_Key int identity(1,1) primary key clustered,
    Test_Value int,
    Test_Thread int,
    ts timestamp
)

create nonclustered index IX_Test_Value on Test (Test_Value) -- probably irrelevant

I started two threads running inserts into this table at the same time. The first thread is running the following code:

declare @i int = 0
while @i < 100
begin
    insert into Test (Test_Value, Test_Thread) select n, 1 from dbo.fn_GenerateNumbers(10000)
    set @i = @i + 1
end

The second thread is running identical code, except that it is doing select n, 2 from the function to insert its thread ID.

First, a word about the function. This uses a series of cross-joined common table expressions with a ROW_NUMBER() to return a lot of numbers in sequence very quickly. I learned this trick from an article by Itzik Ben-Gan, so credit goes to him for it. I don't think the implementation of the function matters, but I will include it anyway:

CREATE FUNCTION dbo.fn_GenerateNumbers(@count int)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
    WITH
        Nbrs_4( n ) AS ( SELECT 1 UNION SELECT 0 ),
        Nbrs_3( n ) AS ( SELECT 1 FROM Nbrs_4 n1 CROSS JOIN Nbrs_4 n2 ),
        Nbrs_2( n ) AS ( SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2 ),
        Nbrs_1( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 ),
        Nbrs_0( n ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 ),
        Nbrs  ( n ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 )

    SELECT n
    FROM ( SELECT ROW_NUMBER() OVER (ORDER BY n) FROM Nbrs ) D ( n )
    WHERE n <= @count ; 

This table has an identity column on it. I expected that when I selected the values from the table by this monotonically increasing primary key, I would see the timestamps in the same order, too. The timestamps might not be sequential, because there might have been other updates, but they would at least be in order.

However, what I am seeing is different. The inserts are interleaving by primary key, but the timestamps are sequential by thread.

Test_Key Test_Value Test_Thread ts
-------- ---------- ----------- ------------------
20227    227        1           0x000000006EDF3BC5
20228    228        1           0x000000006EDF3BC6
20229    229        1           0x000000006EDF3BC7
20230    230        1           0x000000006EDF3BC8
20231    1          2           0x000000006EDF41E9 -- thread 2 starts with a new ts
20232    2          2           0x000000006EDF41EB
20233    3          2           0x000000006EDF41EC
20234    4          2           0x000000006EDF41ED
--<snip lots of thread 2 inserts>
21538    1308       2           0x000000006EDF4710
21539    1309       2           0x000000006EDF4711
21540    1310       2           0x000000006EDF4712
21541    1311       2           0x000000006EDF4713
21542    231        1           0x000000006EDF3BC9 -- This is less than the prior row!
21543    232        1           0x000000006EDF3BCA -- Thread 1 is inserting
21544    233        1           0x000000006EDF3BCB -- from its last ts value
21545    234        1           0x000000006EDF3BCC

My question is:

1) Why is the timestamp not always increasing with concurrent inserts?

Bonus points if you can answer this question:

2) Why are the concurrent inserts overlapping the primary key instead of all being inserted at once? Each insert is running its own implicit transaction, so I expected the primary keys to be in order for a single thread's insert. I did not expect the primary keys to be interleaved.

I don't know enough about replication to answer this one:

3) Do having timestamps out of order cause a problem with replication? In the above example, what if thread 2 commits its data first? When thread 1 completes, its timestamps are all lower than the records inserted by thread 2.

I peeked at the executing requests and verified they are not running parallel, so I don't think parallelism is the problem.

Note that this query was running in the default (READ COMMITTED) isolation level. If I increase the isolation level to SERIALIZABLE, I still get timestamps in reverse order when threads change.

I am testing this on SQL Server 2008 R2.

To check the timestamp orders, I was doing a select * from Test, and I was also using the following queries:

-- find timestamps out of sequential order
select t1.*, t2.*
from Test t1
    inner join Test t2
        on t2.Test_Key = t1.Test_Key + 1
where
    t2.ts <> t1.ts + 1

-- find timestamps that are less than the prior timestamp
select t1.*, t2.*
from Test t1
    inner join Test t2
        on t2.Test_Key = t1.Test_Key + 1
where
    t2.ts < t1.ts

Best Answer

The IDENTITY generator is not well documented. There are some behaviors however that can be observed that seem relevant:

  1. The identity generation does not get affected by transactions. That means once a value has been used it will not be reused, even if the transaction causing its use is rolled back.

  2. Not every use causes an update of the sequence position being written back to the database. You can see that for example after a crash. Often the next used value after a crash is several numbers higher than the previous.

While there is no proof (meaning documentation), it can be assumed that for performance reasons a multi-row insert grabs a block of identity values and uses them until it runs out. Another concurrent thread will get the next block of numbers. At this point the identity value does not actually reflect the order of inserts anymore.

The rowversion data type on the other hand is an ever increasing number that would reflect insert order. (timestamp is a deprecated synonym for rowversion.)

So in your case you can assume that the rows were inserted in the order of the rowversion column and that the out-of-order identity value is caused by in memory performance optimizations.

By the way, while the IDENTITY generator is not very well documented, the new 2012 SEQUENCE functionality is. Here you can read all about the behaviors described above in sequences.

As for your concern with replication:

  1. Transactional replication is using the database log and does not rely on specific column values.

  2. Merge replication uses a rowguid column to identify a row. This is a column that gets valued once and does not change throughout the life of the row. Merge replication does not use a rowversion column. Transactional consistency is enforced by the fact that at the time of a synchronization, normal locking is used, so a transaction is either completely visible to the merge agent or completely invisible.

  3. Snapshot replication does not look for changes at all. It just takes the at the time of the synchronization committed data and copies it over.