Sql-server – Execution Plans Differ for the Same SQL Statement

execution-plansql serversql-server-2008-r2sql-server-2012t-sql

As a background to this question the details can be found at this link.

In a nutshell: I have 2 SQL statements that are almost identical except for the RIGHT table. The LEFT JOIN is the same and the returned columns vary because the tables are different. The databases were moved to a SQL Server 2012 instance from 2008. On the 2012 instance, 1 query executes in under 3 seconds, and the other takes almost 2 hours to execute. The execution plans differ and are posted at the link.

However, I can take both queries and execute them on the SQL Server 2008 server and they both complete in under 3 seconds. Both execution plans on the 2008 server are identical to the 'Good' execution plan on the 2012 server.

I have a new statement that will work but the application that it's used in is a C# windows application using inline SQL. I would like to not break that pattern for this 1 statement. Can anyone shed any light on this problem?

Best Answer

The query optimizer has a number of choices when constructing an execution plan for this query. Among the many strategies available, it can choose between hash join and nested loops join. Which one it decides to use depends sensitively on the statistical information available, and other factors like the amount of memory configured for SQL Server to use.

It just so happens that the optimizer chooses a nested loops strategy in one case, and a hash join in the others. If you were to force the use of a hash join (e.g. using the query hint OPTION (HASH JOIN) where a nested loops join is currently being used, you would find that the estimated cost of the nested loops plan appears to be the cheaper option to the optimizer.

This is not a bug. It is a fairly routine example of plan choice being sensitive to the statistical information available (among other things). The fact that the nested loops join performs so poorly in reality is a consequence of the query and database design not being very optimizer-friendly. Given very low-quality information to work with, the optimizer's plan selection is barely better than a guess.

Anyway, assuming you need to avoid the poor-performing plan shape without changing the source code (a reason to prefer stored procedures over in-lining SQL, by the way) you have two main options:

The first is to use a plan guide to force the 'good' plan shape for the target query. This is reasonably advanced work if you haven't worked with plan guides before. Extra steps will be involved if the literal values specified in the example could ever be different.

A second option is to present the optimizer with a more useful index to use. In this case, this involves adding a computed column (a fast metadata-only operation) and then indexing the new column:

-- Metadata-only operation
ALTER TABLE dbo.InHouse_CSV_Backup
ADD MERSNUMBER_CC AS 
    REPLACE(LTRIM(RTRIM([MERSNUMBER])),'-', '');

-- Index on computed column
CREATE NONCLUSTERED INDEX
    IX_dbo_InHouse_CSV_Backup__MERSNUMBER_CC
ON dbo.InHouse_CSV_Backup (MERSNUMBER_CC)
INCLUDE (MERSNUMBER);

The query is very likely to use the index, resulting in better plan stability and most likely good performance too. It is not a perfect solution by any means, but it is a relatively simple and unobtrusive one, given the information available.

Potential areas for future improvement:

Consider stored procedures instead of embedding SQL in the application
Consider NOT EXISTS instead of outer join with null rejection
Avoid the unnecessary RTRIM when comparing strings with trailing spaces
Be careful with implicit type conversion. For example, MERSMin is varchar(300) but MERSNUMBER is nvarchar(31) (Unicode). The results of expressions have similar derived types, e.g. the REPLACE is deemed to produce an nvarchar(4000).

Related Solutions

Sql-server – How Can the Same Query in Two Nearly Identical Instances Generate Two Different Execution Plans

Execution plans (actual, not estimated) need to be added to the Q for a definitive answer but...

How Can the Same Query in Two Nearly Identical Instances Generate Two Different Execution Plans?

Because, by your admission, they are not identical. Most likely explanation for the different execution plans is a variance in statistics.

Table rows counts are slightly larger on Server A... Statistics are updated nightly at the same time on both servers.

Row counts are different and the statistics update was probably a default sample, rather than FULLSCAN. I've witnessed some comically unfortunate stats histograms as the result of sampling, each of which has been corrected by a FULLSCAN update.

Sql-server – Does SQL Server 2008 store the creation date of execution plans

It isn't stored in sys.dm_exec_cached_plans, nor is it buried anywhere in the plan XML that I can find. There is useful information in other DMVs however.

For stored procedures we can get the time a plan was cached from sys.dm_exec_procedure_stats:

SELECT TOP(250) 
    p.name AS [SP Name]
    , ps.execution_count
    , ps.cached_time
FROM 
    sys.procedures p WITH (NOLOCK)
INNER JOIN 
    sys.dm_exec_procedure_stats ps WITH (NOLOCK)
ON  p.[object_id] = ps.[object_id]
WHERE 
    ps.database_id = DB_ID()
ORDER BY 
    ps.cached_time DESC 
OPTION 
    (RECOMPILE);

For ad-hoc queries, there is the creation time in sys.dm_exec_query_stats:

SELECT TOP(250) 
    st.[text] AS [QueryText]
    , qs.execution_count
    , qs.creation_time
FROM 
    sys.dm_exec_cached_plans cp WITH (NOLOCK)
INNER JOIN
    sys.dm_exec_query_stats qs WITH (NOLOCK)
ON  qs.plan_handle = cp.plan_handle
CROSS APPLY 
    sys.dm_exec_sql_text(cp.plan_handle) st
WHERE 
    cp.objtype = N'Adhoc' 
ORDER BY 
    qs.creation_time DESC 
OPTION 
    (RECOMPILE);

According to @SqlKiwi... the change from creation_time to cached_time was just because procedures and triggers were added in 2008 and the opportunity was taken to choose a more descriptive name. The created/cached time reflects the last compilation, not the creation time of the original plan.

Best Answer

Related Solutions

Sql-server – How Can the Same Query in Two Nearly Identical Instances Generate Two Different Execution Plans

Sql-server – Does SQL Server 2008 store the creation date of execution plans

Related Question