I've faced the similar problem on my production environment, but I managed to reproduce this behavior on Northwind DB.
Consider the following query:
USE NORTHWND;
DECLARE @ids TABLE ( Id NCHAR(50) );
INSERT INTO @ids
VALUES ( N'AROUT' ),
( N'ALFKI' );
SELECT *
FROM ( SELECT c.CustomerID
FROM dbo.Customers c
WHERE c.CustomerID IN ( SELECT Id
FROM @ids )
UNION ALL
SELECT '0'
) t1
JOIN ( SELECT o.CustomerID
FROM dbo.Orders o
--LEFT JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID
UNION ALL
SELECT '0'
) t2 ON t2.CustomerID = t1.CustomerID
OPTION ( RECOMPILE );
It has pretty good query plan:
Now when I uncomment line with LEFT JOIN
, the plan becomes not as good (seems like query processor can't push the CustomerID
filter to the data access operators):
When I add FORCESEEK
hint on dbo.Orders
, query processor can't produce a query plan.
On the other hand, when I remove one of the UNION ALL
, the query plan becomes as good as the first one.
Is that expected behavior? Why does the query processor not put filter by CustomerID
before the join operator?
Best Answer
I think that SQL Server simply does not have the appropriate optimization rules to yield the seek-into-
Orders
query that you are looking for in the case of aUNION ALL
on both sides of the query. Such a query plan is theoretically possible, but the query optimizer is not capable of producing it for your query.In order to reach this conclusion, I compared the full original query (including the
LEFT JOIN
) to an alternate formulation of the query that yields a much lower estimated cost (0.042
vs.0.085
). So SQL Server would be likely to choose this lower cost alternative if it were able to explore this plan shape for your query.The original query plan
Here is the original plan, with an estimated cost of
0.085
.The alternate query plan
This query is semantically equivalent to the original query but has an estimated cost of
0.042
, about half the cost. It is formed by promoting the second UNION ALL up to the top-level of the query. This requires us to referencet1
(the set of customers) twice, but even so yields a plan with half the cost by allowing seeks into theOrders
and[Order Details]
tables.The alternate query
Here is the full query you can use to play around with this approach:
Takeaways / caveats
Concatenation
operators are good at getting in the way of the query optimizer and preventing it from yielding an optimal plan. A couple examples include this Connect issue whereUNION ALL
prevents optimized bitmap filters as well as repeated observations thatConcatenation
operators, particularly if nested, fool the SQL 2012 and earlier Cardinality Estimator and may yield sub-optimal plans due to poor cardinality estimates. ObviouslyUNION ALL
is a very useful tool, but it's worth being aware that it does occasionally limit the query optimizer's ability to optimize your query.SELECT '0'
.)Northwind
are so small that it's difficult to draw conclusions, so testing with your real world data set will be important.