I have a problem understanding why SQL server decides to call user defined function for every value in the table even though only one row should be fetched. The actual SQL is a lot more complex, but I was able to reduce the problem down to this:
select
S.GROUPCODE,
H.ORDERCATEGORY
from
ORDERLINE L
join ORDERHDR H on H.ORDERID = L.ORDERID
join PRODUCT P on P.PRODUCT = L.PRODUCT
cross apply dbo.GetGroupCode (P.FACTORY) S
where
L.ORDERNUMBER = 'XXX/YYY-123456' and
L.RMPHASE = '0' and
L.ORDERLINE = '01'
For this query, SQL Server decides to call GetGroupCode function for every single value that exists in PRODUCT Table, even though the estimate and actual number of rows returned from ORDERLINE is 1 (it's the primary key):
Same plan in plan explorer showing the row counts:
Tables:
ORDERLINE: 1.5M rows, primary key: ORDERNUMBER + ORDERLINE + RMPHASE (clustered)
ORDERHDR: 900k rows, primary key: ORDERID (clustered)
PRODUCT: 6655 rows, primary key: PRODUCT (clustered)
The index being used for the scan is:
create unique nonclustered index PRODUCT_FACTORY on PRODUCT (PRODUCT, FACTORY)
The function is actually slightly more complex, but the same thing happens with a dummy multi-statement function like this:
create function GetGroupCode (@FACTORY varchar(4))
returns @t table(
TYPE varchar(8),
GROUPCODE varchar(30)
)
as begin
insert into @t (TYPE, GROUPCODE) values ('XX', 'YY')
return
end
I was able to "fix" the performance by forcing SQL server to fetch the top 1 product, although 1 is max that can ever be found:
select
S.GROUPCODE,
H.ORDERCAT
from
ORDERLINE L
join ORDERHDR H
on H.ORDERID = M.ORDERID
cross apply (select top 1 P.FACTORY from PRODUCT P where P.PRODUCT = L.PRODUCT) P
cross apply dbo.GetGroupCode (P.FACTORY) S
where
L.ORDERNUMBER = 'XXX/YYY-123456' and
L.RMPHASE = '0' and
L.ORDERLINE = '01'
Then the plan shape also changes to be something I expected it to be originally:
I also though that the index PRODUCT_FACTORY being smaller than the clustered index PRODUCT_PK would have an affect, but even with forcing the query to use PRODUCT_PK, the plan is still the same as original, with 6655 calls to the function.
If I leave out ORDERHDR completely, then the plan starts with nested loop between ORDERLINE and PRODUCT first, and the function is called only once.
I would like to understand what could be the reason for this since all the operations are done using primary keys and how to fix it if it happens in a more complex query that can't be solved this easily.
Edit: Create table statements:
CREATE TABLE dbo.ORDERHDR(
ORDERID varchar(8) NOT NULL,
ORDERCATEGORY varchar(2) NULL,
CONSTRAINT ORDERHDR_PK PRIMARY KEY CLUSTERED (ORDERID)
)
CREATE TABLE dbo.ORDERLINE(
ORDERNUMBER varchar(16) NOT NULL,
RMPHASE char(1) NOT NULL,
ORDERLINE char(2) NOT NULL,
ORDERID varchar(8) NOT NULL,
PRODUCT varchar(8) NOT NULL,
CONSTRAINT ORDERLINE_PK PRIMARY KEY CLUSTERED (ORDERNUMBER,ORDERLINE,RMPHASE)
)
CREATE TABLE dbo.PRODUCT(
PRODUCT varchar(8) NOT NULL,
FACTORY varchar(4) NULL,
CONSTRAINT PRODUCT_PK PRIMARY KEY CLUSTERED (PRODUCT)
)
Best Answer
There are three main technical reasons you get the plan you do:
Given the small cardinality estimate assigned to the UDF apply, the n-ary join expansion heuristics unfortunately reposition it earlier in the tree than you would wish.
The query also qualifies for search 0 optimization by virtue of having at least three joins (including applies). The final physical plan you get, with the odd-looking scan, is based on that heuristically-deduced join order. It is costed low enough that the optimizer considers the plan "good enough". The low cost estimation and cardinality for the UDF contributes to this early finish.
Search 0 (also known as the Transaction Processing phase) targets low-cardinality OLTP-type queries, with final plans that usually feature nested loops joins. More importantly, search 0 runs only a relatively small subset of the optimizer's exploration abilities. This subset does not include pulling an apply up the query tree over a join (rule
PullApplyOverJoin
). This is exactly what is required in the test case to reposition the UDF apply above the joins, to appear last in the sequence of operations (as it were).There is also an issue where the optimizer can decide between naive nested loops join (join predicate on the join itself) and a correlated indexed join (apply) where the correlated predicate is applied on the inner side of the join using an index seek. The latter is usually the desired plan shape, but the optimizer is capable of exploring both. With incorrect costing and cardinality estimates, it can choose the non-apply NL join, as in the submitted plans (explaining the scan).
So, there are multiple interacting reasons involving several general optimizer features that normally work well to find good plans in a short period of time without using excessive resources. Avoiding any one of the reasons is enough to produce the 'expected' plan shape for the sample query, even with empty tables:
There is no supported way to avoid search 0 plan selection, early optimizer termination, or to improve the costing of UDFs (aside from the limited enhancements in the SQL Server 2014 CE model for this). This leaves things like plan guides, manual query rewrites (including the
TOP (1)
idea or using intermediate temporary tables) and avoiding poorly-costed 'black boxes' (from a QO point of view) like non-inline functions.Rewriting
CROSS APPLY
asOUTER APPLY
can also work, as it currently prevents some of the early join-collapsing work, but you have to be careful to preserve the original query semantics (e.g. rejecting anyNULL
-extended rows that might be introduced, without the optimizer collapsing back to a cross apply). You need to be aware though that this behaviour is not guaranteed to remain stable, so you would need to remember to retest any such observed behaviours each time you patch or upgrade the SQL Server.Overall, the right solution for you depends on a variety of factors that we cannot judge for you. I would, however, encourage you to consider solutions that are guaranteed to always work in future, and that work with (rather than against) the optimizer wherever possible.