Sql-server – Query Optimizer using clustered index seek the wrong time

sql serversql-server-2012

I am supporting a SQL Server 2012 Enterprise edition with SP3. One of the tables in our ERP database has 8 million rows. It has a clustered index and 1 non CI. For simplicity, let's say the clustered index is on column Size and the non CI is on columns Size, Weight, Color.

We have a query that goes like:

select top 1 from table_name where size=@P1 and weight=@P2 and plant=@P3

It's a Prepared object type. Historically, the query optimizer does an Index Seek on the non CI and do a key lookup on the CI.

Earlier this week, users started complaining that the database response time seems to be slower than normal.

When I look at the plan cache, this is when I found out the query optimizer has started doing a index seek on the CI instead. You would think it should be faster but the I/O cost is over 200. When I look at the table, there is only one distinct Size value (10) in the table. So now, SQL server has to traverse the 8 million rows to find the second and third fields in the predicate.

I did a DBCC FREEPROCCACHE and when I look back at the plan cache it went back to using the non CI which made the query fast again and users happy. I am able to reproduce the issue by clearing the cache and running the query with different parameters.

Would this be a SQL Server query optimizer bug?

Best Answer

Looks like query performance sometimes suffers due to parameter sniffing and the row goal implemented by your TOP 1 operator. I'll create some sample data similar to your table to illustrate what might be going on:

CREATE TABLE table_name (
size INT NOT NULL,
[weight] INT NULL,
[location] INT NULL,
INDEX CIX CLUSTERED (size)
);

-- insert around 6.5 million rows, lots of skew for weight column
INSERT INTO table_name WITH (TABLOCK)
SELECT 10
, FLOOR(1000 * SQRT(1 - (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) / 6500000.)))
, 0
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

CREATE INDEX IX ON table_name (size, [weight]);

CREATE STATISTICS table_name__weight ON table_name ([weight]) WITH FULLSCAN;

Here is an abbreviated histogram for the weight column:

╔══════════════╦════════════╦═════════╦═════════════════════╦════════════════╗
║ RANGE_HI_KEY ║ RANGE_ROWS ║ EQ_ROWS ║ DISTINCT_RANGE_ROWS ║ AVG_RANGE_ROWS ║
╠══════════════╬════════════╬═════════╬═════════════════════╬════════════════╣
║ 145          ║ 0          ║ 947     ║ 0                   ║ 1              ║
║ 172          ║ 27219      ║ 1139    ║ 26                  ║ 1046.885       ║
║ 190          ║ 20395      ║ 1261    ║ 17                  ║ 1199.706       ║
║ 207          ║ 21119      ║ 1379    ║ 16                  ║ 1319.938       ║
║ ...          ║ ...        ║ ...     ║ ...                 ║ ...            ║
║ 996          ║ 0          ║ 77686   ║ 0                   ║ 1              ║
║ 997          ║ 0          ║ 92220   ║ 0                   ║ 1              ║
║ 998          ║ 0          ║ 120274  ║ 0                   ║ 1              ║
║ 999          ║ 0          ║ 290616  ║ 0                   ║ 1              ║
╚══════════════╩════════════╩═════════╩═════════════════════╩════════════════╝

For example, thee table has 947 rows with a weight of 145 but 290616 rows with a weight of 999. Those filtered values will be used later.

You said:

You would think it should be faster but the I/O cost is over 200.

The first important thing to realize is that not all clustered index scans or seeks are considered equal. SQL Server may report the same IO cost and CPU cost but it may scale down the total cost of the operator depending on how many rows it expects to read. Consider the following two trivial queries:

select top 1 [location]
from table_name; 

select top 9999999 [location]
from table_name; 

Here's the query plan:

trivial scans

Obviously the first query should be cheaper, but the query plans look the same. The operator cost for the first scan is 0.0032831 optimizer units and the operator cost for the second scan is 24.1525 optimizer units. Both scans have the same reported IO cost which is confusing. The physical operation for both queries may be a scan but the first scan can stop after it finds the first row. The row goal introduced by the TOP 1 reduces the subtree cost of the first query but not the second one.

You also said:

When I look at the table, there is only one distinct Size value (10) in the table. So now, SQL server has to traverse the 8 million rows to find the second and third fields in the predicate.

Having just one distinct value for size actually favors the clustered index seek plan. The same principle described above applies when you have a filter. SQL Server may estimate based on statistics that it won't need to read all of the pages from the table to find just a single row that matches the filter. So if a filter condition is always met that means that SQL Server will have to scan less rows to find a match. For example, the following queries have different costs:

select top 1 [location]
from table_name 
where size=10 and [weight]=999;

select top 1 [location]
from table_name  WITH (INDEX(0))
where size=10 and [weight]=145;

I'm using INDEX(0) to force a clustered index seek for the second query plan. Here's the query plan:

scans with filter

The first query has a lower cost because more rows in the table have a value of 999 for weight than 145. SQL Server expects to read back fewer rows in the scan before it finds the first matching one for the first query. The actual query plan in SQL Server 2016 shows exactly how many rows were scanned before a match was found. The first query is the best possible case because SQL Server scanned one row to find a match:

scan 1 row

The second query is nearly the worst possible case because SQL Server scanned almost all rows to find a match:

scan all rows

Due to how the data was loaded we ended up with very inaccurate estimated costs. The costing model used for row goals can lead to inefficient queries if the assumptions of that model don't match how the data is distributed in the pages of the table.

Without the hint, you get the plan that you were expecting when filtering on a weight of 145:

good plan

I suspect what happened was there was no cached plan for this query. The first query that ran had very nonselective filter values for weight and plant. SQL Server calculated due to the row goal that a clustered index seek was cheaper than a seek with your nonclustered index and that plan was cached for later reuse.

So, what can you do about it? SQL Server 2016 has a query hint that can disable the row goal optimization for an individual query. This hint could have negative effects on other parts of the plan, but it will disfavor the clustered index seek that you want to avoid:

select top 1 [location]
from table_name 
where size=10 and [weight]=999
OPTION (USE HINT('DISABLE_OPTIMIZER_ROWGOAL'));

Since you're on SQL Server 2012 you can use trace flag 4138 as described here. If you'd like something less drastic, you could try defining a covering index on the table, you could add an ORDER BY to make the query deterministic, you could add an index hint to the query, or you can force the query plan using any of the available options in SQL Server 2012.