Sql-server – Why does this stored proc cause a clustered index scan, but seek when using OPTION RECOMPILE

execution-planoptimizationparameterrecompilesql server

I think I may know the answer based on my research, but am looking for confirmation on how/why the engine compiles the plan the way it does with

Parameters being passed in:
@ID int
,@OtherID INT

SELECT  b.Column1
        ,b.Column2
        ,b.Column3
        ,b.Column4
        ,b.Column5
        ,c.Column1
        ,b.Column1
        ,e.Column1
FROM    Table1 AS b 
        inner join Table2 AS t
        on b.ID = t.ID
        left join [LINKED SERVER].[DB].dbo.Table3 as c
        on b.ID = c.ID
        left join Table4 AS e
        on b.ID= e.ID
where   (b.ID = @ID or @ID= 0)
        And b.ID = @OtherID
        And b.ID IS NOT NULL
        and e.ID = 1

Now I have determined that the cause of the index scan is because of this line: where (b.ID = @ID or @ID= 0). More specifically, @ID = 0. To clarify even further, 0 for that ID field does not exist as a value in the underlying table, it was simply something a developer did to allow a user to pull back all of the results by passing in 0 to the parameter and then checking to see if that parameter is 0 so as a result more rows are pulled back (typically, you would just return 1-3 results).

Now, what is extremely odd, is that if I add OPTION RECOMPILE, the engine is able to create a much better plan at the cost of overhead (compilation time) of course:

What I would like to know is how is this possible. From what I have read online, by using OPTION RECOMPILE, the engine will literally replace the value with the actual value passed into the parameter and it can very easily see that @ID 1234 does not equal 0. However, if you don't use OPTION RECOMPILE the engine will take the total # of records, which is 120,000, and divide it by the total number of distinct possibilities, 107,000. This comes out to about 1.1 estimated rows being returned and I confirmed this by looking at the estimated properties of the plan that has the index scan, but why would the engine continue to index scan if the estimation is correct? I even updated stats just to be sure.

Best Answer

b.ID = @ID OR @ID = 0

The optimizer has to produce a plan with an index scan, because the plan is cached and reused.

On a subsequent execution, the parameter @ID might be zero. An index seek is of no value in that case, because there is no ID value to seek to. Other times, there will be a non-zero value provided for @ID, but the cached plan has to work correctly for all possible parameter values.

When OPTION (RECOMPILE) is used, the Parameter Embedding Optimization (PEO) means the current value for @ID is used in place of the parameter on each execution, and no plan is cached.

Say @ID is 1234. After PEO, the optimizer sees:

b.ID = 1234 OR 1234 = 0

That is simplified by the contradiction detection logic to:

b.ID = 1234

...which enables a seek on ID.

For further reading please see my article Parameter Sniffing, Embedding, and the RECOMPILE Options.

Best Answer

Related Solutions

Sql-server – Why isn’t SQL Server using the non-clustered index and doing a clustered index scan

Sql-server – Why does the first query scan and the second seek

Related Question