Sql-server – SQL Server chooses unselective index

database-internalsindexoptimizationsql server

I was testing SQL Server indexes and found very strange behavior. Here is my code:

DROP TABLE IF EXISTS  dbo._Test
DROP TABLE IF EXISTS  dbo._Newtest
GO
CREATE TABLE _Test(
ID INT NOT NULL, 
UserSystemID INT NOT NULL, 
Age INT
)
GO
INSERT INTO dbo._Test
        ( ID, UserSystemID, Age )

SELECT TOP 10000000 ABS(CHECKSUM(NEWID())) % 5000000, ABS(CHECKSUM(NEWID())) % 2, ABS(CHECKSUM(NEWID())) % 100
FROM sys.all_columns
CROSS JOIN sys.all_objects a
CROSS JOIN sys.all_objects b
CROSS JOIN sys.all_objects c

; WITH cte AS (
SELECT ID, UserSystemID,  age, ROW_NUMBER() OVER(PARTITION BY ID, UserSystemID ORDER BY GETDATE()) rn
FROM dbo._Test
)

SELECT cte.ID ,
       cte.UserSystemID ,
       cte.Age
INTO _newTest
FROM cte
WHERE cte.rn = 1

CREATE UNIQUE NONCLUSTERED INDEX  IX_test ON dbo._NewTest(ID, UserSystemID) INCLUDE(age)
GO
ALTER TABLE dbo._NewTest ADD CONSTRAINT PK_NewTest PRIMARY KEY CLUSTERED(UserSystemID, ID)
GO

At this point, I have two indexes on the same table and on the same columns. First one is nonclustered and the second one is clustered. The Id column is more selective (about 5000000 unique values) and UserSystemID is not (two unique values).

Then I run the following query to test which index is used:

SELECT id,  UserSystemID, age   
FROM _NewTest
WHERE id = 1502945
AND UserSystemID = 1

It seeks the clustered index. You can see the plan here.

The question is why SQL Server prefers the clustered index instead of the unique nonclustered index.


My leading column of clustered index is much less selective than the other unique nonclustered index. So I expect that the performance must be worse with clustered index but in practice it is not.

Best Answer

Given the unique indexes, your query will select at most one row.

The optimizer knows it will need to descend the index b-tree just once, and will not need to scan forward or backward from that point to find more matches. This is known as a singleton seek (equality test on a unique index).

The current index matching implementation happens to always choose the clustered index when it can use a singleton seek.

The choice between clustered and nonclustered index here is generally not very important. There might be a tiny extra cost as the upper levels of the b-tree are navigated (using binary search or linear interpolation) but this would be challenging to even measure. Remember only the ID and UserSystemID key components are present on non-leaf index pages.

One could argue that wider clustered index leaf pages are less likely to be in memory on average. There are a few other edge case consequences, but I don't see this behaviour being changed any time soon.

But my leading column of clustered index is much less selective than the other unique nonclustered index. So I expect that the performance must be worse with clustered index but in practice it is not.

The selectivity does not matter for equality seek on a compound b-tree index.

Your unique clustered compound index has keys (UserSystemID, id).

To find a row with (UserSystemID = 1 and id = 1502945), SQL Server does not find all rows where UserSystemID = 1, then find rows where id = 1502945. That would be very inefficient.

You can tell how many pages your test query touches using SET STATISTICS IO ON. Your example builds a clustered index with two non-leaf levels. In total, finding the row you want means touching three pages - one at each level of the index.

Rows are ordered in the index by UserSystemID and id. My copy of your demo table has the following layout on the root (top level) page of the clustered index:

root page

Performing a binary search on this page is easy:

  • Start at the middle row.
  • Compare the UserSystemID with the one you are looking for.
    • If it is not equal, continue the binary search in the usual way (choose a new midpoint in the earlier or later rows as appropriate).
    • If equal on UserSystemID, Compare the id with the one you are looking for, and continue the binary search

Following that logic, we will quickly find the child (next lower level) index page the searched keys will be found on if they are present. Repeat the binary search on that page, and so on until we reach the single leaf-level page that must contain the row we are looking for, if it exists.