Sql-server – Need to understand the drastic difference in execution plan between two queries

execution-planoptimizationperformancequery-performancesql server

I have 2 queries.
select guid=NEWID()
into #guid

--update statistics #guid with all
--create statistics s1 on #guid(guid)

select * from Party a join #guid b on a.[Party_GUID] = b.guid --first select
select * from Party a join #guid b on a.[Party_GUID] = b.guid or b.guid is null --second select

drop table #guid

There is an index on Party.Party_GUID. The query execution plan for the first select is excellant. For the second select statement is horrible. I need to understand the reason for this. There is only one row in the #guid table and it has a value. So the query optimizer should be creating a similar execution plan for the second query too. Am I expecting too much of the optimizer ? I have tried this on 2008R2 as well as a 2012.

Admittedly this is a contrived reproduction of an issue that I am currently encountering in our system. The developers have coded a stored procedure with a table valued parameter which holds various combination of search values to be applied on a table. E.g the table valued parameter can have 3 nullable fields, GUID, LastName, FirstName and the application can populate it with values ((N'89241068-7068-4728-9CD0-A565FC2BFDEB', null, null), (null, "smith", "john"), (null, null, "jane")). The expectation is that the stored procedure apply it as a filter. e.g
select * from Party a join @tablevar b on a.Party_GUID = b.guid OR b.guid is null
and (a.LastName = b.LastName or b.LastName is null)
and (a.FirstName = b.FirstName or b.FirstName is null)

One can argue that this is tough query to optimize but it-is-what-it-is currently and am trying to look at avenues to help the optimizer come up with the best query plans. I do understand that some inputs can result in a horrible plan. What I am trying to understand is why the "OR is null" clause degrades the execution plan so much even if there are indexes on the table.

I know there are dbcc traceflags which will help me understand why the optimizer chose a particular plan but i find those hard to comprehend.

Any help appreciated.

Best Answer

To summarize: the application sends a TVP where each row is a set of search parameters, where a null value in the TVP functioning as a "wildcard" indicate no filtering on that attribute. The goal of the query is to return all the rows in the target table that match any of the rows in the input TVF.

So if the TVP sends (Id=123,Name=null),(ID=null,Name='Joe'), the procedure should return all rows match either the first set of criteria or the second.

Am I expecting too much of the optimizer ?

Yes. For this to work well, the QO would neeed to create a seperate plan for each row in the input TVF, and it simply was never build to do that. For each row in the TVF a table scan will be required, as no single index can be used to evaluate the join criteria.

So you actually need to run a separate query for each row in the input TVF. You can cursor over them and load a temp table, reducing this to an iterative form of a classic dynamic search query, which you can use Dynamic SQL or OPTION RECOMPILE to get tailored execution plans.

This

What I am trying to understand is why the "OR is null" clause degrades the execution plan so much even if there are indexes on the table.

Is simple and not too relevant. In the sample you posted if b.guid is null, then the query returns every row in Party.

Related Solutions

Different execution plan for the same query if I change a value in the predicate

The problem was histograms, I ran statistics and disabled histogram creation and the execution plan used nested loops:

BEGIN
  DBMS_STATS.GATHER_table_STATS (OWNNAME => 'MIDAS', TABNAME => 'MINCISOC', 
  METHOD_OPT => 'FOR ALL COLUMNS SIZE 1');
END;

If I run it with FOR ALL COLUMNS SIZE AUTO again the same problem because it uses hash join. Thanks to Phil for the suggestion.

Sql-server – Execution plan index suggestion – difference between similar queries

Some context

The first things to observe is that your clustered index does not help with the lookup of the tid column, because t is the leading column in the index.

If you flipped the order of t and tid in the key, I would expect the index hint to go away and the query to run faster without adding any new indexes.

Specific Answer

The most likely reason that the second query plan does not suggest an index is that the new filter on tid picks more than around 30% of all the values in the table. When that is the case, SQL Server will generally prefer to table scan instead of seeking the index (because this is the better strategy). Hence, suggesting an index is no longer the right thing to do. Selecting two values instead of one may be exactly the tipping point of this 30% constant in the optimiser

To explore this effect, you can force the index hint to return by doing this:

SELECT *
FROM [dbo].[Values] WITH (FORCESEEK)
WHERE  [tid] = 1 or [tid] = 2

.. But please don't do this in production, only to explore the effects.

Why the include?

The include is there because if it wasn't, the execution would have to do this:

Find the value you are looking for in the newly created index on tid
Go to the main index (the primary key) to pick up the values of column v and t

The INCLUDE makes sure all the columns are contained in the index so the second step is not needed. Incidentally, this is one of the reasons why you should eschew SELECT *.

Best Answer

Related Solutions

Different execution plan for the same query if I change a value in the predicate

Sql-server – Execution plan index suggestion – difference between similar queries

Related Question