Sql-server – Seek predicate not using all available columns

sql serversql-server-2008-r2

I have a strange query compilation issue which is hard to reproduce. It only happens under high load and cannot be easily repeated.

There is a table T with columns A,B,C,D.
There is a non-unique clustered index on T(A, B, C, D).
There is a query SELECT * FROM T WHERE A=@P1 AND B=@P2 AND (C=@P3 OR
C=@P4) AND D=@P5. The seek condition is on all columns of the
clustered index, the 3-rd column has an OR.

The problem is that the query plan for this query has Seek Predicate only on A and B! The predicate on C and D is an ordinary predicate, so this means that search tree on columns C and D is not utilized.

The data types for all parameters match column data types.

Could anyone provide any hints on why this could be happening? SQL version is 2008 R2 (SP1) – 10.50.2789.0 (X64)

Best Answer

For a parameterised query It can't just do two seeks on

WHERE A=@P1 AND B=@P2 AND C=@P3 AND D=@P5

and

WHERE A=@P1 AND B=@P2 AND C=@P4 AND D=@P5

Because if @P3 = @P4 that would incorrectly bring back duplicate rows. So it would need an operator that removed duplicates from these first.

From a quick test this end it appears to be dependant upon the size of the table whether or not you get that. In the test below 245/246 rows is the cut off point between plans (this was also the cut off point between the index fitting all on one page and it becoming 2 leaf pages and a root page).

CREATE TABLE T(A INT,B INT,C INT,D INT)

INSERT INTO T
SELECT TOP (245) 1,2,3,5
FROM master..spt_values v1

CREATE CLUSTERED INDEX IX ON T(A, B, C, D)

SELECT index_level,page_count, record_count
FROM sys.dm_db_index_physical_stats(db_id(),object_id('T'),1,NULL, 'DETAILED')

DECLARE @C1 INT = 3,
        @C2 INT = 4

 SELECT * FROM T WHERE A=1 AND B=2 AND (C=@C1 OR C=@C2) AND D=5

 DROP TABLE T

1 Pages / 245 rows

This plan has a seek on A=1 AND B=2 with a residual predicate on (C=@C1 OR C=@C2) AND D=5

Plan 1

2 leaf Pages / 246 rows

Plan 2

In the second plan the extra operators are responsible for removing any duplicates from @C1,@C2 first before performing the seek(s).

The seek in the second plan is actually a range seek between A=1 AND B=2 AND C > Expr1010 and A=1 AND B=2 AND C < Expr1011 with a residual predicate on D=5. It still isn't an equality seek on all 4 columns. More information about the additional plan operators can be found here.

Adding OPTION (RECOMPILE) does allow it to inspect the parameter values for duplicates at compile time and produces a plan with two equality seeks.

You could also achieve that with

;WITH CTE
     AS (SELECT DISTINCT ( C )
         FROM   (VALUES (@C1),
                        (@C2)) V(C))
SELECT CA.*
FROM   CTE
       CROSS APPLY (SELECT *
                    FROM   T
                    WHERE A=1 AND B=2 AND D=5  AND C = CTE.C) CA

Plan 3

But actually in this test case it would likely be counter productive as having two seeks into the single page index rather than one increases the logical IO.

Related Solutions

Sql-server – Force Sql server to use index seek in merge join

To get a clustered index seek, you'd need a clustered index that supports your filter (e.g. leading key would have to be Personal_ID, not ID).

You can't force a seek if there's no index with a leading column of Personal_ID that supports the filter.

This does not mean you should change the existing clustered index, unless this is the only query you ever run against the table.

And while you could create a non-clustered index with Personal_ID as the key column, a seek on that index might not be what you want anyway - since you're using SELECT * (are you really sure you need all columns from both tables?), it will need to go fetch the rest of the columns from the clustered index anyway, and if there are more than _{some number of rows} returned, at some point a seek (well, what would amount to a range scan disguised as a seek) + lookups will be more expensive than a regular scan.

Why do you think you need a seek here? How many rows does the query return, how wide are they, and how long does it take? Is this just educational, or are you under the assumption that a seek will always perform better than a scan? (It won't, btw.)

Some useful reading:

Sql-server – Index SEEK is not used unless OPTION (RECOMPILE)

Summarizing some of the main points from our chat room discussion:

Generally speaking, SQL Server caches a single plan for each statement. That plan must be valid for all possible future parameter values.

It is not possible to cache a seek plan for your query, because that plan would not be valid if, for example, @productid is null.

In some future release, SQL Server might support a single plan that dynamically chooses between a scan and a seek, depending on runtime parameter values, but that is not something we have today.

General problem class

Your query is an example of a pattern variously referred to as a "catch all" or "dynamic search" query. There are various solutions, each with their own advantages and disadvantages. In modern versions of SQL Server (2008+), the main options are:

IF blocks
OPTION (RECOMPILE)
Dynamic SQL using sp_executesql

The most comprehensive work on the topic is probably by Erland Sommarskog, which is included in the references at the end of this answer. There is no getting away from the complexities involved, so it is necessary to invest some time in trying each option out to understand the trade-offs in each case.

`IF` blocks

To illustrate an IF block solution for the specific case in the question:

IF @productid IS NOT NULL AND @priceid IS NOT NULL
BEGIN
    SELECT 
        T.productID,
        T.priceID
    FROM dbo.Transactions AS T
    WHERE
        T.productID = @productid
        AND T.priceID = @priceid;
END;
ELSE IF @productid IS NOT NULL
BEGIN
    SELECT 
        T.productID,
        T.priceID
    FROM dbo.Transactions AS T
    WHERE
        T.productID = @productid;
END;
ELSE IF @priceid IS NOT NULL
BEGIN
    SELECT 
        T.productID,
        T.priceID
    FROM dbo.Transactions AS T
    WHERE
        T.priceID = @priceid;
END;
ELSE
BEGIN
    SELECT 
        T.productID,
        T.priceID
    FROM dbo.Transactions AS T;
END;

This contains a separate statement for the four possible null-or-not-null cases for each of the two parameters (or local variables), so there are four plans.

There is a potential problem there with parameter sniffing, which might require an OPTIMIZE FOR hint on each query. Please see the references section to explore these types of subtleties.

Recompile

As noted above an in the question, you could also add an OPTION (RECOMPILE) hint to get a fresh plan (seek or scan) on each invocation. Given the relatively slow frequency of calls in your case (once every ten seconds on average, with a sub-millisecond compilation time) it seems likely this option will be suitable for you:

SELECT
    T.productID,
    T.priceID
FROM dbo.Transactions AS T
WHERE
    (T.productID = @productid OR @productid IS NULL)
    AND (T.priceID = @priceid OR @priceid IS NULL)
OPTION (RECOMPILE);

It is also possible to combine features from the above options in creative ways, to make the most of the advantages of each method, while minimizing the downsides. There really is no shortcut to understanding this stuff in detail, then making an informed choice backed by realistic testing.