Sql-server – Why would optimiser choose Clustered Index + Sort instead of Non-Clustered Index

Given the next example:

IF OBJECT_ID('dbo.my_table') IS NOT NULL
    DROP TABLE [dbo].[my_table];
GO

CREATE TABLE [dbo].[my_table]
(
    [id]    int IDENTITY (1,1)  NOT NULL PRIMARY KEY,
    [foo]   int                 NULL,
    [bar]   int                 NULL,
    [nki]   int                 NOT NULL
);
GO

/* Insert some random data */
INSERT INTO [dbo].[my_table] (foo, bar, nki)
SELECT TOP (100000)
    ABS(CHECKSUM(NewId())) % 14,
    ABS(CHECKSUM(NewId())) % 20,
    n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM 
    sys.all_objects AS s1 
CROSS JOIN 
    sys.all_objects AS s2
GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
    ON [dbo].[my_table] ([nki] ASC);
GO

If I fetch all records ordered by [nki] (Non-clustered index):

SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;

SQL Server Execution Times: CPU time = 266 ms, elapsed time = 493 ms

Optimiser chooses the clustered index and then applies a Sort algorithm.

Execution plan

But if I force it to use the non-clustered index:

SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table WITH(INDEX(IX_my_TABLE));
SET STATISTICS TIME OFF;

SQL Server Execution Times: CPU time = 311 ms, elapsed time = 188 ms

Then it uses non-clustered index with a Key Lookup:

Execution plan

Obviously if the non-clustered index is transformed into a covering index:

CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
    ON [dbo].[my_table] ([nki] ASC)
    INCLUDE (id, foo, bar);
GO

Then it uses only this index:

SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;

SQL Server Execution Times: CPU time = 32 ms, elapsed time = 106 ms

Execution plan

Question

Why does SQL Server use the clustered index plus a sort algorithm instead of using a non-clustered index even if the execution time is 38% faster in the latter case?

Best Answer

Because SQL Server uses a cost-based optimizer based on statistics, not runtime info.

During the cost estimation process for this query, it does actually evaluate the lookup plan, but estimates it will take more effort. (Note the "Estimated Subtree Cost" when hovering over SELECT in the execution plan). That's not necessarily a bad assumption either - on my test machine, the lookup plan takes 6X the CPU of the sort/scan.

Look to Rob Farley's answer as to why SQL Server might cost the lookup plan higher.

Question

Best Answer

Related Solutions

Sql-server – What affects the speed of a count(*) besides the number of records

Sql-server – Large time difference between two almost identical simple queries…why

Related Question