Sql-server – Execution Plan is NOT using INDEX, It uses Table Scan

execution-planindexoptimizationsql serversql-server-2012

I know when it comes to using an index or a table scan, SQL Server uses statistics to see which one is better.

I have a table with 20 million rows. I have an index on (SnapshotKey, Measure) and this query:

select Measure, SnapshotKey, MeasureBand
from t1
where Measure = 'FinanceFICOScore'
group by Measure, SnapshotKey, MeasureBand

The query returns 500k rows. So the query selects only 2.5% of the table's rows.

The question is why SQL Server does not use the nonclustered index I have, and uses a table scan instead?

Statistics are updated.

Good to mention that the query performance is good though.

Table Scan

Table Scan

Forced Index

Force Index

Table/Index Structure

CREATE TABLE [t1](
    [SnapshotKey] [int] NOT NULL,
    [SnapshotDt] [date] NOT NULL,
    [Measure] [nvarchar](30) NOT NULL,
    [MeasureBand] [nvarchar](30) NOT NULL,
    -- and many more fields
) ON [PRIMARY]

No PK on table, as it is a data warehouse.

CREATE NONCLUSTERED INDEX [nci_SnapshotKeyMeasure] ON [t1]
(
    [SnapshotKey] ASC,
    [Measure] ASC
)

Best Answer

Index seek might not be the best choice if you return many rows and/or the rows are very wide. Lookups can be expensive if your index is not covering. See #2 here.

In your scenario, the query optimizer estimates that performing 50,000 individual lookups will be more expensive than a single scan. The optimizer's choice between scan and seek (with RID lookups for the columns needed by the query, but not present in the nonclustered index) is based on the estimated cost of each alternative.

The optimizer always chooses the lowest cost alternative it considers. If you look at the Estimated Subtree Cost property in the root node of the two execution plans, you will see that the scan plan has a lower estimated cost than the seek plan. As a result, the optimizer chose the scan. That is essentially the answer to your question.

Now, the cost model used by the optimizer is based on assumptions and "magic numbers" that are quite unlikely to match your system's performance characteristics. In particular, one assumption made in the model is that the query starts executing with none of the required data or index pages already in memory. Another is that sequential I/O (expected for a scan) is cheaper than the random I/O pattern assumed for RID Lookups. There are many other such assumptions and caveats, far too many to go into in detail here.

Nevertheless, the cost model as a whole has been shown to produce generally "good enough" plans for most queries, on most database schemas, on most hardware configurations, most of the time, everywhere. That is quite an achievement, if you think about it.

Model limitations and other factors will sometimes mean the optimizer chooses a plan that is not, in fact, "good enough" at all. You report that "performance is good", so that does not seem to be the case here.