SQL Server – Temp Table Clustered Key Not Honored: Bug or Expected?

sql servertemporary-tables

As I was putting some test sets of data together, I noticed some funny behavior with temp tables. When working with large sets of data in clustered temp tables that are populated via a parallel execution plan, the clustered key does not look to be honored when selecting data. This issue also seems to affect all versions of SQL Server that I've tested (include vNext).

Here's a dbfiddle.uk example of the test. You may have to execute it a couple of times to get the result I am finding, but it shouldn't take more than one or two executions to yield the same results. Additionally, this is the local execution plan I'm getting on my environment which shows that the only difference between the large and small data sets is the way data is fed into the tables (e.g. parallel vs serial plan).

If you want to play-at-home, here's the test I'm running:

-- Large Data Set
CREATE TABLE #tmp
(
    ID  INT PRIMARY KEY CLUSTERED
)

INSERT INTO #tmp
-- Purposely insert in reverse order
SELECT TOP 100 PERCENT RN
FROM
(
    SELECT TOP (10000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) x
ORDER BY RN DESC


-- Smaller Data Set
CREATE TABLE #tmp2
(
    ID  INT PRIMARY KEY CLUSTERED
)

INSERT INTO #tmp2
-- Purposely insert in reverse order
SELECT TOP 100 PERCENT RN
FROM
(
    SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) x
ORDER BY RN DESC

-- Large Record Set
-- Clustered Key Not Honored*
SELECT TOP 10 *
FROM #tmp

-- Small Record Set
-- Clustered Key Honored
SELECT TOP 10 *
FROM #tmp2

DROP TABLE #tmp
DROP TABLE #tmp2

I've not found any references indicating this is expected behavior, but before I submit a connect item, I first wanted to reach out and confirm this isn't a localized problem. Can someone either point me to documentation identifying this is expected behavior or alternatively confirm this is, in-fact a bug?

EDIT: In response to the comments about not including an ORDER BY clause, I was always under the assumption the TOP keyword returned the data in the order in which it was inserted, which should, in this case, be the order dictated by the clustered key. When running the same statement against a formal table, the expected behavior is returned:

-- Large Data Set with a Formal Data Table
CREATE TABLE tmp
(
    ID  INT PRIMARY KEY CLUSTERED
)

INSERT INTO tmp
-- Purposely insert in reverse order
SELECT TOP 100 PERCENT RN
FROM
(
    SELECT TOP (10000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) x
ORDER BY RN DESC

-- Large Record Set
-- Clustered Key Not Honored*
SELECT TOP 10 *
FROM tmp

DROP TABLE tmp

(6325225 row(s) affected)


(1 row(s) affected)
ID
-----------
1
2
3
4
5
6
7
8
9
10

(10 row(s) affected)



(1 row(s) affected)

Even the execution plans are the same, so why the different result sets between a temp table and a formally defined table?

Finally, a shout out to Joe Obbish as I gratuitously ripped off his CROSS JOIN approach to build large sets of test data as it's quite efficient!

Best Answer

There is no guarantee of ORDER without ORDER BY.

The execution plan for both has "Ordered = False".

enter image description here

This means you may get the results in key order but equally may not.

Specifically see When can allocation order scans be used?

The only time such a scan will be used is when there’s no possibility of the data changing (e.g. when the TABLOCK hint is specified, or when the table is in a read-only database) or when its explicitly stated that we don’t care (e.g. when the NOLOCK hint is specifed or under READ UNCOMMITTED isolation level). As a further twist, there’s a trade-off with setup cost of the allocation order scan against the number of pages that will b read – an allocation order scan will only be used if there’s more than 64 pages to be read.

As the local temp table is not accessible to other connections you get this behaviour without explicitly taking a table lock however the comment about size of table still applies which is why you see the difference in your two cases.

If you need a specific order add an ORDER BY to get scan in key order (with "Ordered = True").