Sql-server – Confused about why a particular index is being used

indexperformancesql server

I have a table (with 170M rows) that looks as follows:

CREATE TABLE [dbo].[Panel]
(
    [SubId] [varchar](15) NOT NULL,
    [LineageId] [int] NULL,
    [Buck] [varchar](20) NULL,
    [Lot] [varchar](20) NULL,
    [GlassType] [varchar](20) NULL,
    [ETA] [varchar](200) NULL,
 CONSTRAINT [PK_Panel] PRIMARY KEY CLUSTERED 
(
    [SubId] ASC
)

99% of the queries against this table reference SubId either in the Where clause or join. One of our DBAs told me he could make all those queries and joins perform better by creating the following index:

CREATE UNIQUE NONCLUSTERED INDEX [IX-Panel-SubID-I-LineageID] ON [dbo].[Panel]
(
    [SubId] ASC
)
INCLUDE ([LineageId])

When he told me this, I thought he was nuts. But I just checked index usage since this index was created and found the following:

PK_Panel (232,394 seeks / 2,133 scans)
IX-Panel-SubID-I-LineageID (25,528 seeks / 3644 scans)

I was a bit shocked to see this. Under what circumstances would this new index every get used? Why would SQL Server ever select it?

Or maybe a better question would be, why would SQL Server select the new index to do a seek, instead of the clustered index? Approximately 25K times, it thought seeking on the new index was a better choice.

In case this helps, the LineageId essentially indicates where a panel was created and there are ~35 distinct values it could contain.

Best Answer

Creating an index on the column involved in the clustering key may seem a little strange. One asks why create another index, when one already exists?

The clustered index is the table. That is to say your clustered index for this table is similar to an index like:

CREATE INDEX IX_Panel
ON dbo.Panel(SubId ASC)
INCLUDE (
    LineageId
    , Buck
    , Lot
    , GlassType
    , ETA
);

Pretty clearly, this is not the same as an non-clustered index on the clustering key, which in your case only "includes" the LineageId column:

CREATE UNIQUE NONCLUSTERED INDEX [IX-Panel-SubID-I-LineageID] ON [dbo].[Panel]
(
    [SubId] ASC
)
INCLUDE ([LineageId]);

As a test, I created a mock-up of your table, then insert over 700,000 rows into it:

INSERT INTO dbo.Panel(LineageId, Buck, Lot, GlassType, ETA)
SELECT (ROW_NUMBER() OVER (ORDER BY o1.object_id, o2.object_id) % 35)
    , SUBSTRING(o2.name, 1, 15)
    , 'lot'
    , 'GlassType'
    , o3.name + o2.name
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3;

I then ran the following query to see stats about both indexes:

SELECT o.name
    , ps.index_id
    , ps.index_type_desc
    , ps.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.Panel'),-1,0, NULL) ps
INNER JOIN sys.objects o ON ps.object_id = o.object_id;

The results of the query above are:

Clearly, the non-clustered index is smaller than the clustered index. In this case about 1/5th the size. Being a smaller index means the query optimizer will choose to use it when the index fulfills the requirements of the query, in one way or another.

For instance,

SELECT SubId
FROM dbo.Panel;

will only need to read 1,304 pages from disk by scanning the non-clustered index, instead of having to read 7,105 pages to scan the clustered-index.

On the other hand, for a query that references columns that are not included in the non-clustered index, SQL Server will likely use the clustered index since it fulfills the requirements of the query.

Take for instance, the following query, where you might intuitively think the query optimizer might pick the non-clustered index since it can use that to fulfill the WHERE clause, however the simple act of having to look up the Buck column means it is faster to simply seek the clustered index for the values in the WHERE, then return the 3 columns in the SELECT clause:

SELECT SubId
    , LineageId
    , Buck
FROM dbo.Panel
WHERE LineageId = 6
    AND SubId >= 27
    AND SubId <= 42;

This is a bit of a simplification, I would highly recommend looking at Brent Ozar's Index pages

Best Answer

Related Solutions

Sql-server – Why did creating this new index improve performance so much when existing index included all columns in new index

Sql-server – Index fragmentation increase as more row being deleted

Related Question