Sql-server – Threshold for sort spill and random cardinality estimates

optimizationquery-performancesortingsql serversql server 2014

I wanted to test tempdb spill warnings so I run the following script on SQL Server 2014:

USE tempdb

IF OBJECT_ID('tempdb..tblTest') IS NOT NULL DROP TABLE tblTest

CREATE TABLE tblTest
(
          c1 INT         PRIMARY KEY CLUSTERED,
          c2 INT        ,
          c3 CHAR (1000)
);

GO
SET NOCOUNT ON;

BEGIN TRANSACTION;

DECLARE @i AS INT;

SET @i = 1;

WHILE @i <= 10000
          BEGIN
                    INSERT  INTO tblTest (c1, c2, c3)
                    VALUES              (@i, @i, 'a');
                    SET @i = @i + 1;
          END

COMMIT TRANSACTION;

GO
UPDATE STATISTICS dbo.tblTest
GO
SET STATISTICS XML ON;
GO
--no tempdb spill (SQL Server 2014)
--in sql server 2012 always different "Estimated number of rows" if you run the whole script several times
SELECT   *
FROM     tblTest
WHERE    c1 <= 5948
ORDER BY c2
OPTION (MAXDOP 1);

GO
SET STATISTICS XML OFF;

SET STATISTICS XML ON;
GO
--no tempdb spill (SQL Server 2014)
SELECT   *
FROM     tblTest
WHERE    c1 <= 5949
ORDER BY c2
OPTION (MAXDOP 1);

GO
SET STATISTICS XML OFF;

(core of the query was based on one material from MS)

1) The first thing I am curious about is what causes the tempdb spill during sort operation at this specific level of c1 column. All the estimates are correct and count of the read page is the same for both queries. So why is the second query spilled? (In another words why memory grant for the later query is so much higher).

2) I have tested this query on SQL Server 2012 and got very interesting behaviour as well. Firstly I was unable to get to the same thresholds, so I run the script repeatedly and noticed that estimated number of rows is always different than it was in the previous run. My question is why is the estimated number of returned rows always different when I repeatedly run the very same script (creating, inserting a updating it's own statistics via full scan)?

Best Answer

  1. The amount of memory needed to perform a sort is not as simple as computing the raw size of the input data. The main sorting algorithm used by SQL Server is a variation on merge sort, which includes extra steps like key normalization to ensure all combinations of data column types can be sorted efficiently. Due to these extra steps, it is not easy to predict the amount of memory that would be needed to avoid a spill at runtime; the memory grant is an estimate. Spilling is part of the design. You should not be overly concerned by small spills in edge cases like this.

  2. SQL Server 2014 includes a new cardinality estimation module, which is used if the context database is compatibility level 120. You use tempdb for your test, which will be 120 level by default. You can get the pre-2014 CE behaviour by changing the compatibility level, or using trace flag 9481. Regarding the different number of estimated rows, the statistics may be sampled unless you use the WITH FULLSCAN option with UPDATE STATISTICS.