Sql-server – Optimization issue with user defined function

sql serversql-server-2005

I have a problem understanding why SQL server decides to call user defined function for every value in the table even though only one row should be fetched. The actual SQL is a lot more complex, but I was able to reduce the problem down to this:

select  
    S.GROUPCODE,
    H.ORDERCATEGORY
from    
    ORDERLINE L
    join ORDERHDR H on H.ORDERID = L.ORDERID
    join PRODUCT P  on P.PRODUCT = L.PRODUCT    
    cross apply dbo.GetGroupCode (P.FACTORY) S
where   
    L.ORDERNUMBER = 'XXX/YYY-123456' and
    L.RMPHASE = '0' and
    L.ORDERLINE = '01'

For this query, SQL Server decides to call GetGroupCode function for every single value that exists in PRODUCT Table, even though the estimate and actual number of rows returned from ORDERLINE is 1 (it's the primary key):

Query Plan

Same plan in plan explorer showing the row counts:

Plan explorer
Tables:

ORDERLINE: 1.5M rows, primary key: ORDERNUMBER + ORDERLINE + RMPHASE (clustered)
ORDERHDR:  900k rows, primary key: ORDERID (clustered)
PRODUCT:   6655 rows, primary key: PRODUCT (clustered)

The index being used for the scan is:

create unique nonclustered index PRODUCT_FACTORY on PRODUCT (PRODUCT, FACTORY)

The function is actually slightly more complex, but the same thing happens with a dummy multi-statement function like this:

create function GetGroupCode (@FACTORY varchar(4))
returns @t table(
    TYPE        varchar(8),
    GROUPCODE   varchar(30)
)
as begin
    insert into @t (TYPE, GROUPCODE) values ('XX', 'YY')
    return
end

I was able to "fix" the performance by forcing SQL server to fetch the top 1 product, although 1 is max that can ever be found:

select  
    S.GROUPCODE,
    H.ORDERCAT
from    
    ORDERLINE L
    join ORDERHDR H
        on H.ORDERID = M.ORDERID
    cross apply (select top 1 P.FACTORY from PRODUCT P where P.PRODUCT = L.PRODUCT) P
    cross apply dbo.GetGroupCode (P.FACTORY) S
where   
    L.ORDERNUMBER = 'XXX/YYY-123456' and
    L.RMPHASE = '0' and
    L.ORDERLINE = '01'

Then the plan shape also changes to be something I expected it to be originally:

Query Plan with top

I also though that the index PRODUCT_FACTORY being smaller than the clustered index PRODUCT_PK would have an affect, but even with forcing the query to use PRODUCT_PK, the plan is still the same as original, with 6655 calls to the function.

If I leave out ORDERHDR completely, then the plan starts with nested loop between ORDERLINE and PRODUCT first, and the function is called only once.

I would like to understand what could be the reason for this since all the operations are done using primary keys and how to fix it if it happens in a more complex query that can't be solved this easily.

Edit: Create table statements:

CREATE TABLE dbo.ORDERHDR(
    ORDERID varchar(8) NOT NULL,
    ORDERCATEGORY varchar(2) NULL,
    CONSTRAINT ORDERHDR_PK PRIMARY KEY CLUSTERED (ORDERID)
)

CREATE TABLE dbo.ORDERLINE(
    ORDERNUMBER varchar(16) NOT NULL,
    RMPHASE char(1) NOT NULL,
    ORDERLINE char(2) NOT NULL,
    ORDERID varchar(8) NOT NULL,
    PRODUCT varchar(8) NOT NULL,
    CONSTRAINT ORDERLINE_PK PRIMARY KEY CLUSTERED (ORDERNUMBER,ORDERLINE,RMPHASE)
)

CREATE TABLE dbo.PRODUCT(
    PRODUCT varchar(8) NOT NULL,
    FACTORY varchar(4) NULL,
    CONSTRAINT PRODUCT_PK PRIMARY KEY CLUSTERED (PRODUCT)
)

Best Answer

There are three main technical reasons you get the plan you do:

  1. The optimizer's costing framework has no real support for non-inline functions. It does not make any attempt to look inside the function definition to see how expensive it might be, it just assigns a very small fixed cost, and estimates the function will produce 1 row of output each time it is called. Both these modelling assumptions are very often completely unsafe. The situation is very slightly improved in 2014 with the new cardinality estimator enabled since the fixed 1-row guess is replaced with a fixed 100-row guess. There is still no support for costing the content of non-inline functions, however.
  2. SQL Server initially collapses joins and applies into a single internal n-ary logical join. This helps the optimizer reason about join orders later on. Expanding the single n-ary join into candidate join orders comes later, and is largely based on heuristics. For example, inner joins come before outer joins, small tables and selective joins before large tables and less selective joins, and so on.
  3. When SQL Server performs cost-based optimization, it splits the effort into optional phases to minimize the chances of spending too long optimizing low-cost queries. There are three main phases, search 0, search 1, and search 2. Each phase has entry conditions, and later phases enable more optimizer explorations than earlier ones. Your query happens to qualify for the least-capable search phase, phase 0. A low enough cost plan is found there that later stages are not entered.

Given the small cardinality estimate assigned to the UDF apply, the n-ary join expansion heuristics unfortunately reposition it earlier in the tree than you would wish.

The query also qualifies for search 0 optimization by virtue of having at least three joins (including applies). The final physical plan you get, with the odd-looking scan, is based on that heuristically-deduced join order. It is costed low enough that the optimizer considers the plan "good enough". The low cost estimation and cardinality for the UDF contributes to this early finish.

Search 0 (also known as the Transaction Processing phase) targets low-cardinality OLTP-type queries, with final plans that usually feature nested loops joins. More importantly, search 0 runs only a relatively small subset of the optimizer's exploration abilities. This subset does not include pulling an apply up the query tree over a join (rule PullApplyOverJoin). This is exactly what is required in the test case to reposition the UDF apply above the joins, to appear last in the sequence of operations (as it were).

There is also an issue where the optimizer can decide between naive nested loops join (join predicate on the join itself) and a correlated indexed join (apply) where the correlated predicate is applied on the inner side of the join using an index seek. The latter is usually the desired plan shape, but the optimizer is capable of exploring both. With incorrect costing and cardinality estimates, it can choose the non-apply NL join, as in the submitted plans (explaining the scan).

So, there are multiple interacting reasons involving several general optimizer features that normally work well to find good plans in a short period of time without using excessive resources. Avoiding any one of the reasons is enough to produce the 'expected' plan shape for the sample query, even with empty tables:

Plan on empty tables with search 0 disabled

There is no supported way to avoid search 0 plan selection, early optimizer termination, or to improve the costing of UDFs (aside from the limited enhancements in the SQL Server 2014 CE model for this). This leaves things like plan guides, manual query rewrites (including the TOP (1) idea or using intermediate temporary tables) and avoiding poorly-costed 'black boxes' (from a QO point of view) like non-inline functions.

Rewriting CROSS APPLY as OUTER APPLY can also work, as it currently prevents some of the early join-collapsing work, but you have to be careful to preserve the original query semantics (e.g. rejecting any NULL-extended rows that might be introduced, without the optimizer collapsing back to a cross apply). You need to be aware though that this behaviour is not guaranteed to remain stable, so you would need to remember to retest any such observed behaviours each time you patch or upgrade the SQL Server.

Overall, the right solution for you depends on a variety of factors that we cannot judge for you. I would, however, encourage you to consider solutions that are guaranteed to always work in future, and that work with (rather than against) the optimizer wherever possible.