Sql-server – Execution Plans Differ for the Same SQL Statement

execution-plansql serversql-server-2008-r2sql-server-2012t-sql

As a background to this question the details can be found at this link.

In a nutshell: I have 2 SQL statements that are almost identical except for the RIGHT table. The LEFT JOIN is the same and the returned columns vary because the tables are different. The databases were moved to a SQL Server 2012 instance from 2008. On the 2012 instance, 1 query executes in under 3 seconds, and the other takes almost 2 hours to execute. The execution plans differ and are posted at the link.

However, I can take both queries and execute them on the SQL Server 2008 server and they both complete in under 3 seconds. Both execution plans on the 2008 server are identical to the 'Good' execution plan on the 2012 server.

I have a new statement that will work but the application that it's used in is a C# windows application using inline SQL. I would like to not break that pattern for this 1 statement. Can anyone shed any light on this problem?

Best Answer

The query optimizer has a number of choices when constructing an execution plan for this query. Among the many strategies available, it can choose between hash join and nested loops join. Which one it decides to use depends sensitively on the statistical information available, and other factors like the amount of memory configured for SQL Server to use.

It just so happens that the optimizer chooses a nested loops strategy in one case, and a hash join in the others. If you were to force the use of a hash join (e.g. using the query hint OPTION (HASH JOIN) where a nested loops join is currently being used, you would find that the estimated cost of the nested loops plan appears to be the cheaper option to the optimizer.

This is not a bug. It is a fairly routine example of plan choice being sensitive to the statistical information available (among other things). The fact that the nested loops join performs so poorly in reality is a consequence of the query and database design not being very optimizer-friendly. Given very low-quality information to work with, the optimizer's plan selection is barely better than a guess.

Anyway, assuming you need to avoid the poor-performing plan shape without changing the source code (a reason to prefer stored procedures over in-lining SQL, by the way) you have two main options:

The first is to use a plan guide to force the 'good' plan shape for the target query. This is reasonably advanced work if you haven't worked with plan guides before. Extra steps will be involved if the literal values specified in the example could ever be different.

A second option is to present the optimizer with a more useful index to use. In this case, this involves adding a computed column (a fast metadata-only operation) and then indexing the new column:

-- Metadata-only operation
ALTER TABLE dbo.InHouse_CSV_Backup
ADD MERSNUMBER_CC AS 
    REPLACE(LTRIM(RTRIM([MERSNUMBER])),'-', '');

-- Index on computed column
CREATE NONCLUSTERED INDEX
    IX_dbo_InHouse_CSV_Backup__MERSNUMBER_CC
ON dbo.InHouse_CSV_Backup (MERSNUMBER_CC)
INCLUDE (MERSNUMBER);

The query is very likely to use the index, resulting in better plan stability and most likely good performance too. It is not a perfect solution by any means, but it is a relatively simple and unobtrusive one, given the information available.

Potential areas for future improvement: