Sql-server – Monumental difference in execution time between queries when using RECOMPILE query hint

optimizationperformancequery-performancesql serversql-server-2005

I have two almost identical queries running on the same SQL Server 2005 instance:

  1. The first one is the original SELECT query as generated by LINQ (I know, I know… I'm not the application developer, just the DBA :).
  2. The second one is exactly the same as the first one, added a OPTION (RECOMPILE) at the end.

Nothing else has been changed.

The first one takes 55 seconds everytime its run.
The second one takes 2 seconds.

Both result sets are identical.

Why would this hint generate such a dramatic gain in performance?

The Books Online entry on RECOMPILE doesn't offer a much detailed explanation:

Instructs the SQL Server Database Engine to discard the plan generated
for the query after it executes, forcing the query optimizer to
recompile a query plan the next time the same query is executed.
Without specifying RECOMPILE, the Database Engine caches query plans
and reuses them. When compiling query plans, the RECOMPILE query hint
uses the current values of any local variables in the query and, if
the query is inside a stored procedure, the current values passed to
any parameters.

RECOMPILE is a useful alternative to creating a stored procedure that
uses the WITH RECOMPILE clause when only a subset of queries inside
the stored procedure, instead of the whole stored procedure, must be
recompiled. For more information, see Recompiling Stored Procedures.
RECOMPILE is also useful when you create plan guides. For more
information, see Optimizing Queries in Deployed Applications by Using
Plan Guides.

Since my query has a lot of local variables, my guess is that SQL Server is able to (seriously) optimize it when I use the OPTION (RECOMPILE) query hint.

Everywhere I look people are saying that OPTION (RECOMPILE) should be avoided. The explanation for this is generally that using this hint SQL Server is not able to reuse this exection plan and therefore has to waste time recompiling it everytime.
(But) Given the gigantic performance advantage, I'm inclined to think that using this query hint this time would be a good thing.

Should I use it?
If not, is there a way I can force SQL Server to use a better execution plan without this hint and without altering the application?

Best Answer

As documented in the article Statistics Used by the Query Optimizer in Microsoft SQL Server 2005

If you use a local variable in a query predicate instead of a parameter or literal, the optimizer resorts to a reduced-quality estimate, or a guess for selectivity of the predicate. Use parameters or literals in the query instead of local variables

When the optimiser has no usable statistics at all for a column it will guess that an = predicate will match 10% of rows, BETWEEN 9%, and any of >, >=, < and <= will match 30%. If there are column statistics available an = predicate will be treated differently as below.

Even when local variables are used in a query, an estimate that is better than a guess is used in the case of equality predicates. Selectivity for conditions of the form "@local_variable = column_name" is estimated using the average value frequency from the histogram for column_name. So, for example, if the column column_name contains all unique values, then a selectivity estimate of 1/(number of unique values in column) will be used, which is accurate.

So this is essentially the same as using for OPTIMIZE FOR (UNKNOWN). It may well be more accurate than a flat 10% guess but it isn't tailored to the specific values that you are querying.

To force SQL Server to optimize a query every time it is run, and use the values of local variables to estimate cardinality and cost during optimization of the query, add the RECOMPILE query hint to the query.

With the use of RECOMPILE you presumably are getting more accurate cardinality estimates and therefore a different plan with join orders/ join types more suited to the number of rows returned from different parts of your actual query.