Sql-server – Why SQL Server uses Non-Clustered but not Clustered Index

clustered-indexindexnonclustered-indexsql serversql server 2014

I have a table with 145 million rows

CREATE TABLE [dbo].[RFTest](
    [SnapshotKey] [int] NOT NULL,
    [SnapshotDt] [datetime] NOT NULL,
    [LoanNum] [int] NOT NULL,
    [GLSourceSystem] [varchar](10) NOT NULL,
    [FlowDescription] [varchar](30) NULL,
    [Account] [varchar](30) NULL,

    --- plus 20 more column
)

The table is partitioned on SnapshotDt.

I added following indexes on my table:

create clustered index ci on RFTest (SnapshotDt, SnapshotKey, LoanNum)

create nonclustered index nci on RFTest (SnapshotDt, SnapshotKey, LoanNum)
    include ([GLSourceSystem],[Account],[FlowDescription])

I ran below query: (I use top 100 for test as the whole table will take long time if I wanted to run it)

select top 100 *
from RFTest with (index(ci)) -- force index
where LoanNum = 2712

select top 100 *
from RFTest 
where LoanNum = 2712

LoanNum column exists in both indexes, part of key in clustered and included in non-clustered.

Execution plan shows the engine choose non-clustered "nci" index, NOT the clustered one.

See Execution plan here

I like to know why.

Clarification:

To me in both case SQL read the same amount of data. and LoanNum is in both index and BTW, the LoanNum is part of key so It seems to me that it makes more sense if it use clustered index.

The indexes are exactly as I posted. There were some comments in queries, when I captured plan. the query you see in the post is correct. I don't want to keep both index, I was trying to see which one performs better and the question came to me.

Best Answer

The optimizer has a choice between two main strategies:

  1. Scan the table (the clustered index) checking every row to see if LoanNum = 2712.
  2. Scan & Lookup
    • Scan the nonclustered index to find rows where LoanNum = 2712
    • Look up the column data for the matched rows not covered by the nonclustered index.

The key point is that the nonclustered index is smaller, so scanning it is expected to be cheaper. This might seem counterintuitive because the clustered index definition has the same keys, and the nonclustered index has included columns, but the point is that the clustered index includes all columns stored in-row - the leaf (lowest) level of the clustered index literally is the in-row data.

For a small number of expected matches, the cost saving from scanning the smaller index is more than enough to compensate for the key lookups.

Incidentally, you may find that removing the WHERE 1 = 1 from your query causes the optimizer to choose the clustered index scan. The (redundant) constant-to-constant comparison prevents SQL Server parameterizing the query, so estimates are based on statistical information about LoanNum 2712. If the query is parameterized, SQL Server will use the average distribution of LoanNum values, which might result in a higher number of expected rows, and a change of plan choice.

See also: