Sql-server – Why clustered index scan

execution-plansql serversql-server-2012sql-server-express

I just started to learn to optimize my queries and to analyze their query plans. I thought this query would generate nonclustered index seek + key lookup.

SELECT ct.*
FROM Person.ContactType AS ct
WHERE ct.Name LIKE 'Own%';

Instead it uses clustered index scan. I don't know why.

I'm working with AdventureWorks2012 database on SQL Server 2012 Express.
There is clustered index on ContactTypeId column and nonclustered index on Name column.
There is third column (ModifiedDate) that is not part of any index. This table contains only 20 rows.

I suspect that query optimizer decided to do clustered index scan because table has only 20 rows and maybe it is quicker to scan index then to do key lookup.

Best Answer

This table is very small!

It has 20 rows of which 2 match the search condition. The table definition contains three columns and two indexes (which both support uniqueness constraints).

CREATE TABLE Person.ContactType(
    ContactTypeID int IDENTITY(1,1) NOT NULL,
    Name dbo.Name NOT NULL,
    ModifiedDate datetime NOT NULL,
    CONSTRAINT PK_ContactType_ContactTypeID PRIMARY KEY CLUSTERED(ContactTypeID),
    CONSTRAINT AK_ContactType_Name UNIQUE NONCLUSTERED(Name)
)

Running

SELECT index_type_desc,
       index_depth,
       page_count,
       avg_page_space_used_in_percent,
       avg_record_size_in_bytes
FROM   sys.dm_db_index_physical_stats(db_id(), 
                                      object_id('Person.ContactType'), 
                                      NULL, 
                                      NULL, 
                                      'DETAILED')

Shows both indexes only consist of a single leaf page with no upper level pages.

+--------------------+-------------+------------+--------------------------------+--------------------------+
|  index_type_desc   | index_depth | page_count | avg_page_space_used_in_percent | avg_record_size_in_bytes |
+--------------------+-------------+------------+--------------------------------+--------------------------+
| CLUSTERED INDEX    |           1 |          1 | 15.9130219915987               | 62.5                     |
| NONCLUSTERED INDEX |           1 |          1 | 13.1949592290586               | 51.5                     |
+--------------------+-------------+------------+--------------------------------+--------------------------+

Rows on each index page aren't necessarily in index key order but each page has a slot array with the offset of each row on the page. This is maintained in index order.

The nonclustered index covers two out of the three columns (Name as a key column and ContactTypeID as a row locator back to the base table) but is missing ModifiedDate.

You can use index hints to force the NCI seek as below

SELECT ct.*
FROM   Person.ContactType AS ct WITH (INDEX = AK_ContactType_Name)
WHERE  ct.Name LIKE 'Own%';

But you can see that under SQL Server's cost model this plan is given a higher estimated cost than the competing CI scan (roughly double).

enter image description here

The single page clustered index scan would just need to read all the 20 rows on the page, evaluate the predicate against them and return them.

The single page nonclustered index range seek might potentially be able to perform a binary search on the slot array to reduce the number of rows evaluated however the index does not cover the query so it would also need a potential IO to retrieve the CI page and then it would still need to locate the row with the missing column values on there (for each row returned by the NCI seek).

On my machine running 1 million iterations of the non clustered index plan took 15.245 seconds compared to 11.113 seconds for the clustered index plan. Whilst this is far from double the plan without the hint was measurably faster.

Even if the table was orders of magnitude larger however you may well still not get your expected plan with lookups.

SQL Server's costing model prefers sequential scans to random IO lookups and the "tipping point" between it choosing a scan of a covering index or a seek and lookups of a non covering one is often surprisingly low as discussed in Kimberley Tripp's blog post here.

It is certainly not out of the question that it would choose such a plan for a 10% selective predicate but the clustered index would likely need to be quite a lot wider than the NCI for it to do so.

Related Solutions

Sql-server – Why is SQL Server using a clustered index scan for a self referencing FK cascade delete

It needs to validate that the row you are trying to delete is not a parent of an existing row.

You don't have an index on ParentTestId.

So it must do the scan.

CREATE NONCLUSTERED INDEX ix ON  [dbo].[Test](ParentTestId)

Then you see a seek.

BTW: The 20% estimated cost of the scan is likely to be an underestimate in this case.

The FK validation is under a left semi join and SQL Server costs it as though only a partial scan will be needed and it will find a matching row and the delete will fail.

Presumably the rows you are actually deleting will succeed more often than not and so a full scan will be required in order to validate that there are no conflicting rows.

Using trace flag 4138 to turn off row goals

DELETE FROM dbo.Test
WHERE  TestId = 200 
OPTION (querytraceon 4138 )

The re-costed plan shows the CI scan at 100% rather than 20% (as it now assumes a full scan will be needed)

This difference in estimated cost is sufficient for the missing index suggestion to show up.

The costs shown in this plan are still not very representative however. You might notice that they add up to 219%.

Also the overall plan cost of the queries with and without the trace flag are both identical at 0.0168268. The full CI scan ought, in fact, to be costed at 0.152373 (0.0485075 + 0.103866)

enter image description here

but it seems to be capped at no more than the original plan cost (and the overall plan cost doesn't get adjusted upwards either hence incorrect percentages)

Sql-server – Clustered Table Scan Because of “SELECT *”

If you need columns in the output that aren't covered by the index, the optimizer has to make a choice:

Perform a table / clustered index scan (therefore all columns are there)
Perform a seek, then perform lookups to retrieve the columns not covered

Which way it will choose depends on a variety of things, including how narrow the index is, how many rows match the predicate, etc. You can force a seek with the FORCESEEK hint, but I suspect it will end up performing the same or worse than the scan SQL Server has chosen in your case.

Some options:

Change the app to run a proper query. I listed this first for a reason.
Create a view that selects only the columns you need:
```
CREATE VIEW dbo.myview
WITH SCHEMABINDING
AS
  SELECT col1, col2, col3 FROM dbo.tablename;
```
Then you can change the app to SELECT * from this view. Or you can get even more creative and rename the original table, and change the name of this view to what the name of the table used to be. Breaking change, obviously; proceed with caution.
Add all of the other columns to the key or INCLUDE list for the index. If these are hard-coded values and always the ones used, you may consider a filtered index.

Best Answer

Related Solutions

Sql-server – Why is SQL Server using a clustered index scan for a self referencing FK cascade delete

Sql-server – Clustered Table Scan Because of “SELECT *”

Related Question