As I was putting some test sets of data together, I noticed some funny behavior with temp tables. When working with large sets of data in clustered temp tables that are populated via a parallel execution plan, the clustered key does not look to be honored when selecting data. This issue also seems to affect all versions of SQL Server that I've tested (include vNext).
Here's a dbfiddle.uk example of the test. You may have to execute it a couple of times to get the result I am finding, but it shouldn't take more than one or two executions to yield the same results. Additionally, this is the local execution plan I'm getting on my environment which shows that the only difference between the large and small data sets is the way data is fed into the tables (e.g. parallel vs serial plan).
If you want to play-at-home, here's the test I'm running:
-- Large Data Set
CREATE TABLE #tmp
(
ID INT PRIMARY KEY CLUSTERED
)
INSERT INTO #tmp
-- Purposely insert in reverse order
SELECT TOP 100 PERCENT RN
FROM
(
SELECT TOP (10000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) x
ORDER BY RN DESC
-- Smaller Data Set
CREATE TABLE #tmp2
(
ID INT PRIMARY KEY CLUSTERED
)
INSERT INTO #tmp2
-- Purposely insert in reverse order
SELECT TOP 100 PERCENT RN
FROM
(
SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) x
ORDER BY RN DESC
-- Large Record Set
-- Clustered Key Not Honored*
SELECT TOP 10 *
FROM #tmp
-- Small Record Set
-- Clustered Key Honored
SELECT TOP 10 *
FROM #tmp2
DROP TABLE #tmp
DROP TABLE #tmp2
I've not found any references indicating this is expected behavior, but before I submit a connect item, I first wanted to reach out and confirm this isn't a localized problem. Can someone either point me to documentation identifying this is expected behavior or alternatively confirm this is, in-fact a bug?
EDIT: In response to the comments about not including an ORDER BY
clause, I was always under the assumption the TOP keyword returned the data in the order in which it was inserted, which should, in this case, be the order dictated by the clustered key. When running the same statement against a formal table, the expected behavior is returned:
-- Large Data Set with a Formal Data Table
CREATE TABLE tmp
(
ID INT PRIMARY KEY CLUSTERED
)
INSERT INTO tmp
-- Purposely insert in reverse order
SELECT TOP 100 PERCENT RN
FROM
(
SELECT TOP (10000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) x
ORDER BY RN DESC
-- Large Record Set
-- Clustered Key Not Honored*
SELECT TOP 10 *
FROM tmp
DROP TABLE tmp
(6325225 row(s) affected)
(1 row(s) affected)
ID
-----------
1
2
3
4
5
6
7
8
9
10
(10 row(s) affected)
(1 row(s) affected)
Even the execution plans are the same, so why the different result sets between a temp table and a formally defined table?
Finally, a shout out to Joe Obbish as I gratuitously ripped off his CROSS JOIN approach to build large sets of test data as it's quite efficient!
Best Answer
There is no guarantee of
ORDER
withoutORDER BY
.The execution plan for both has "Ordered = False".
This means you may get the results in key order but equally may not.
Specifically see When can allocation order scans be used?
As the local temp table is not accessible to other connections you get this behaviour without explicitly taking a table lock however the comment about size of table still applies which is why you see the difference in your two cases.
If you need a specific order add an
ORDER BY
to get scan in key order (with"Ordered = True"
).