SQL Server Performance – Clustered Table Scan Due to SELECT *

indexnonclustered-indexperformancequery-performancesql server

I have a Records table with over 100 columns and very many rows, and a nonclustered index on 5 fields based on my access paths:

CREATE NONCLUSTERED INDEX [IX_Records_CustomerID]
ON [dbo].[Records] (
    [CustomerID] ASC, -- int
    [IsInvalid] ASC, -- int
    [IsProcessed] ASC, -- bit
    [IsRejected] ASC, -- bit
    [RecordName] ASC, -- varchar(12)
;

The 5 fields do not include the primary key RecordID, which is the column in the clustered index.

Here is my poorly performing query:

SELECT * FROM Records WHERE CustomerID IN (181, 283, 505)

The execution plan shows that it performs a Clustered Index Scan, which I understand is because I'm selecting columns that are not included in the index. In Management Studio, I change the query to:

SELECT CustomerID, IsInvalid, IsProcessed, IsRejected, RecordName FROM Records 
    WHERE CustomerID IN (181, 283, 505)

And the execution plan shows an Index Seek, and the query execution time drops from 44 seconds to 2 seconds. However, I lack the liberty in the application to replace the * with only the columns I need and have included in my index.

Is there any way around the clustered index scan when I'm locked into SELECT *?

Best Answer

If you need columns in the output that aren't covered by the index, the optimizer has to make a choice:

Perform a table / clustered index scan (therefore all columns are there)
Perform a seek, then perform lookups to retrieve the columns not covered

Which way it will choose depends on a variety of things, including how narrow the index is, how many rows match the predicate, etc. You can force a seek with the FORCESEEK hint, but I suspect it will end up performing the same or worse than the scan SQL Server has chosen in your case.

Some options:

Change the app to run a proper query. I listed this first for a reason.
Create a view that selects only the columns you need:
```
CREATE VIEW dbo.myview
WITH SCHEMABINDING
AS
  SELECT col1, col2, col3 FROM dbo.tablename;
```
Then you can change the app to SELECT * from this view. Or you can get even more creative and rename the original table, and change the name of this view to what the name of the table used to be. Breaking change, obviously; proceed with caution.
Add all of the other columns to the key or INCLUDE list for the index. If these are hard-coded values and always the ones used, you may consider a filtered index.

Related Solutions

Clustered Index / Index Oriented Table performance in join, worse than nonclustered

You are correct Erik. In case a large portion of the leaf level of the clustered index needs to be read, the size of the other columns affects the amount of data that needs to be read, since the leaf level of a clustered index contains the table pages.

Nonclustered indexes contain the clustered index values for the ability to perform a lookup when a column that is not in the nonclustered index needs to be fetched. The optimizer can leverage this in order to fetch the clustered index values from there when it decided it's cheaper to do that.

SQL Server Query Performance – Optimizing Group By with Many Columns

The non-clustered index you have tested is not the best for this query. It can be used for the WHERE clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY.

The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE clause), then have all the columns used in the GROUP BY and then INCLUDE all the other columns used in the SELECT:

CREATE INDEX special_ix 
  ON dbo.Commissions_Output
    ( company, location, account, 
      salesroute, employee, producttype, 
      item, loadjdate, commissionrate ) 
INCLUDE 
  ( [Extended Sales Price], [Delivered Qty] ) 
WHERE 
  ( [Extended Sales Price] <> 0 ) ;

Best Answer

Related Solutions

Clustered Index / Index Oriented Table performance in join, worse than nonclustered

SQL Server Query Performance – Optimizing Group By with Many Columns

Related Question