Sql-server – Why scan clustered index instead seek nonclustered index

clustered-indexexecution-planindexnonclustered-indexsql server

There is SQL script:

CREATE TABLE users 
  ( 
     id        INT, 
     firstname VARCHAR(50), 
     surname   VARCHAR(50) 
  ); 

CREATE CLUSTERED INDEX ix_users_id 
  ON users (id); 

CREATE NONCLUSTERED INDEX ix_users_firstname 
  ON users (firstname); 

SELECT firstname, 
       surname 
FROM   users 
WHERE  firstname = 'John';

I do not understand why for the above SELECT request, Engine of SQL Server 2019 selected follows Execution Plan:

Why the Clustered Index is scanned? I thought, faster is:

seek Nonclustered Index;
move on a Clustered Index by Clustered Index Pointer, which
storing in Leaf Node of Nonclustered Index;
and take rest surname value from there.

Best Answer

The key idea here is that your index contains (firstname,id), but not surname. So the options for this query

SELECT firstname, 
       surname 
FROM   users 
WHERE  firstname = 'John';

are

1) Scan the clustered index

2) Seek the non-clustered index, and then for every matching row in the index, Seek on the Clustered Index to find the surname. It's this "bookmark lookup" that is the most expensive part of the query, and if a reasonable percentage of your users are named 'John', it may well be cheaper just to scan the clustered index.

This is why we have indexes with included columns. You can add surname to the index to enable this query to seek on the non-clustered index, and avoid the bookmark lookup. The index would then be a "covering index" for the query. eg

CREATE NONCLUSTERED INDEX ix_users_firstname 
  ON users (firstname) 
   include (surname);

Related Solutions

Sql-server – Clustered vs Nonclustered Index

Since we are talking about the clustered index, just because you defined the CI key column as ID, you still have the DeletedDate data in the leaf data pages of the index. That's the nature of the clustered index: It is the table data.

Because you are typically having queries that look like:

select *
from YourTable
where DeletedDate is null;

You will likely benefit from a filtered index.

create nonclustered index IX_YourFilteredNci
on YourTable(<Key Columns Here>)
where DeletedDate is not null;
go

I didn't explicitly put the key columns here (and nonkey columns through the use of the INCLUDE clause) because you didn't publish the DDL of your table.

As in my comment above to your question, the choice of key columns (not just columns, but also the order of the columns) will largely depend on your workload and the typical queries that would be using this index.

If you are looking to cover your query(ies), then you would need to ensure that the index satisfies all of the data required of the query(ies). Not to mention, if you have other WHERE clauses (besides your NULL check on DeletedDate) or joins to consider, then the order of your key columns can be the deciding factor between a scan or a seek. And even though it is filtered, and depending on how much data you have in the index, the penalty could be considerable.

Sql-server – Confused about why a particular index is being used

Creating an index on the column involved in the clustering key may seem a little strange. One asks why create another index, when one already exists?

The clustered index is the table. That is to say your clustered index for this table is similar to an index like:

CREATE INDEX IX_Panel
ON dbo.Panel(SubId ASC)
INCLUDE (
    LineageId
    , Buck
    , Lot
    , GlassType
    , ETA
);

Pretty clearly, this is not the same as an non-clustered index on the clustering key, which in your case only "includes" the LineageId column:

CREATE UNIQUE NONCLUSTERED INDEX [IX-Panel-SubID-I-LineageID] ON [dbo].[Panel]
(
    [SubId] ASC
)
INCLUDE ([LineageId]);

As a test, I created a mock-up of your table, then insert over 700,000 rows into it:

INSERT INTO dbo.Panel(LineageId, Buck, Lot, GlassType, ETA)
SELECT (ROW_NUMBER() OVER (ORDER BY o1.object_id, o2.object_id) % 35)
    , SUBSTRING(o2.name, 1, 15)
    , 'lot'
    , 'GlassType'
    , o3.name + o2.name
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3;

I then ran the following query to see stats about both indexes:

SELECT o.name
    , ps.index_id
    , ps.index_type_desc
    , ps.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.Panel'),-1,0, NULL) ps
INNER JOIN sys.objects o ON ps.object_id = o.object_id;

The results of the query above are:

Clearly, the non-clustered index is smaller than the clustered index. In this case about 1/5th the size. Being a smaller index means the query optimizer will choose to use it when the index fulfills the requirements of the query, in one way or another.

For instance,

SELECT SubId
FROM dbo.Panel;

will only need to read 1,304 pages from disk by scanning the non-clustered index, instead of having to read 7,105 pages to scan the clustered-index.

On the other hand, for a query that references columns that are not included in the non-clustered index, SQL Server will likely use the clustered index since it fulfills the requirements of the query.

Take for instance, the following query, where you might intuitively think the query optimizer might pick the non-clustered index since it can use that to fulfill the WHERE clause, however the simple act of having to look up the Buck column means it is faster to simply seek the clustered index for the values in the WHERE, then return the 3 columns in the SELECT clause:

SELECT SubId
    , LineageId
    , Buck
FROM dbo.Panel
WHERE LineageId = 6
    AND SubId >= 27
    AND SubId <= 42;

This is a bit of a simplification, I would highly recommend looking at Brent Ozar's Index pages

Best Answer

Related Solutions

Sql-server – Clustered vs Nonclustered Index

Sql-server – Confused about why a particular index is being used

Related Question