Sql-server – Need to understand the drastic difference in execution plan between two queries

execution-planoptimizationperformancequery-performancesql server

I have 2 queries.
select guid=NEWID()
into #guid

--update statistics #guid with all
--create statistics s1 on #guid(guid)

select * from Party a join #guid b on a.[Party_GUID] = b.guid --first select
select * from Party a join #guid b on a.[Party_GUID] = b.guid or b.guid is null --second select

drop table #guid

There is an index on Party.Party_GUID. The query execution plan for the first select is excellant. For the second select statement is horrible. I need to understand the reason for this. There is only one row in the #guid table and it has a value. So the query optimizer should be creating a similar execution plan for the second query too. Am I expecting too much of the optimizer ? I have tried this on 2008R2 as well as a 2012.enter image description here

Admittedly this is a contrived reproduction of an issue that I am currently encountering in our system. The developers have coded a stored procedure with a table valued parameter which holds various combination of search values to be applied on a table. E.g the table valued parameter can have 3 nullable fields, GUID, LastName, FirstName and the application can populate it with values ((N'89241068-7068-4728-9CD0-A565FC2BFDEB', null, null), (null, "smith", "john"), (null, null, "jane")). The expectation is that the stored procedure apply it as a filter. e.g
select * from Party a join @tablevar b on a.Party_GUID = b.guid OR b.guid is null
and (a.LastName = b.LastName or b.LastName is null)
and (a.FirstName = b.FirstName or b.FirstName is null)

One can argue that this is tough query to optimize but it-is-what-it-is currently and am trying to look at avenues to help the optimizer come up with the best query plans. I do understand that some inputs can result in a horrible plan. What I am trying to understand is why the "OR is null" clause degrades the execution plan so much even if there are indexes on the table.

I know there are dbcc traceflags which will help me understand why the optimizer chose a particular plan but i find those hard to comprehend.

Any help appreciated.

Best Answer

To summarize: the application sends a TVP where each row is a set of search parameters, where a null value in the TVP functioning as a "wildcard" indicate no filtering on that attribute. The goal of the query is to return all the rows in the target table that match any of the rows in the input TVF.

So if the TVP sends (Id=123,Name=null),(ID=null,Name='Joe'), the procedure should return all rows match either the first set of criteria or the second.

Am I expecting too much of the optimizer ?

Yes. For this to work well, the QO would neeed to create a seperate plan for each row in the input TVF, and it simply was never build to do that. For each row in the TVF a table scan will be required, as no single index can be used to evaluate the join criteria.

So you actually need to run a separate query for each row in the input TVF. You can cursor over them and load a temp table, reducing this to an iterative form of a classic dynamic search query, which you can use Dynamic SQL or OPTION RECOMPILE to get tailored execution plans.

This

What I am trying to understand is why the "OR is null" clause degrades the execution plan so much even if there are indexes on the table.

Is simple and not too relevant. In the sample you posted if b.guid is null, then the query returns every row in Party.