I have a table (with 170M rows) that looks as follows:
CREATE TABLE [dbo].[Panel]
(
[SubId] [varchar](15) NOT NULL,
[LineageId] [int] NULL,
[Buck] [varchar](20) NULL,
[Lot] [varchar](20) NULL,
[GlassType] [varchar](20) NULL,
[ETA] [varchar](200) NULL,
CONSTRAINT [PK_Panel] PRIMARY KEY CLUSTERED
(
[SubId] ASC
)
99% of the queries against this table reference SubId either in the Where clause or join. One of our DBAs told me he could make all those queries and joins perform better by creating the following index:
CREATE UNIQUE NONCLUSTERED INDEX [IX-Panel-SubID-I-LineageID] ON [dbo].[Panel]
(
[SubId] ASC
)
INCLUDE ([LineageId])
When he told me this, I thought he was nuts. But I just checked index usage since this index was created and found the following:
PK_Panel (232,394 seeks / 2,133 scans)
IX-Panel-SubID-I-LineageID (25,528 seeks / 3644 scans)
I was a bit shocked to see this. Under what circumstances would this new index every get used? Why would SQL Server ever select it?
Or maybe a better question would be, why would SQL Server select the new index to do a seek, instead of the clustered index? Approximately 25K times, it thought seeking on the new index was a better choice.
In case this helps, the LineageId essentially indicates where a panel was created and there are ~35 distinct values it could contain.
Best Answer
Creating an index on the column involved in the clustering key may seem a little strange. One asks why create another index, when one already exists?
The clustered index is the table. That is to say your clustered index for this table is similar to an index like:
Pretty clearly, this is not the same as an non-clustered index on the clustering key, which in your case only "includes" the
LineageId
column:As a test, I created a mock-up of your table, then insert over 700,000 rows into it:
I then ran the following query to see stats about both indexes:
The results of the query above are:
Clearly, the non-clustered index is smaller than the clustered index. In this case about 1/5th the size. Being a smaller index means the query optimizer will choose to use it when the index fulfills the requirements of the query, in one way or another.
For instance,
will only need to read 1,304 pages from disk by scanning the non-clustered index, instead of having to read 7,105 pages to scan the clustered-index.
On the other hand, for a query that references columns that are not
included
in the non-clustered index, SQL Server will likely use the clustered index since it fulfills the requirements of the query.Take for instance, the following query, where you might intuitively think the query optimizer might pick the non-clustered index since it can use that to fulfill the
WHERE
clause, however the simple act of having to look up theBuck
column means it is faster to simply seek the clustered index for the values in theWHERE
, then return the 3 columns in theSELECT
clause:This is a bit of a simplification, I would highly recommend looking at Brent Ozar's Index pages