Sql-server – Forcing correct execution plan when passing a table UDT to a table function

execution-planperformancequery-performancesql serversql-server-2012table-valued-parameters

I have a scalar function that returns a big XML, created over a bunch of invoices.

The exact list of invoices to feed to the function can be calculated using several different ways, but the function is same every time. For this reason, I declared a user-defined table type to contain primary keys from the eInvoice.Header table and pass it to the function. This way I can have several different functions to decide which invoices to process, and just one function that actually generates XML:

create function eInvoice.GetRelevantLinesInOneWay()
returns table ...

create function eInvoice.GetRelevantLinesInAnotherWay()
returns table ...

create function eInvoice.GetXML(@lines eInvoice.InvoicePrimaryKeys readonly)
returns xml
as
begin
    declare @x xml;

    with xmlnamespaces(N'important namespace' as pro)
    select @x = (
        select
            ...
        from
            eInvoice.Header h
            inner join @lines l on h.ST_PRIMARY = l.invoice_row_id
        for xml path(N'pro:Import'), type
    );

    return @x;
end;

Unfortunately this setup has proven to be very fragile.

Typically @lines contains about 150 rows (out of about 1m in eInvoce.Header). The correct execution plan is to use index seek on ST_PRIMARY, and that is what always happens when I execute the body of eInvoice.GetXML as an ad-hoc query.

However when I store it as a function, it works as expected for some little time, then something happens (too many rows in @lines, like about 300?), and it decides to change the execution plan to a full scan on eInvoice.Header and to keep it that way.

With the seek plan the function executes instantly, with the scan plan I don't know how much time it takes, I waited for 30 minutes and cancelled it.

I've tried various ways of forcing the seek plan on the function.

Adding with (forceseek) after eInvoce.Header seems to work when I click "Display estimated execution plan," but when I actually execute it, the hint is ignored and a scan is performed.
Adding option (recompile) inside the function or outside, in the calling code, does not seem to have an effect.
Capturing the execution plan and hardcoding it inside the function with option (use plan N'<plan>') does not work because @lines contains a primary key on its only column, and the name of that index is different for each instance of @lines, but the execution plan has to refer to the index by fixed name.
I tried deleting cached execution plans and recompiling the function several times. It sometimes helps for a very limited time and sometimes has no effect.

Is there a way to force the index seek?

Best Answer

Summary

The question does not provide execution plans or a full reproduction script, but based on the information given, you should use a FAST 1 hint. If you can, you should also consider converting the scalar function to the inline table-valued type.

AdventureWorks-based examples

Download the Microsoft sample database here.

Table type

CREATE TYPE dbo.PrimaryKeys AS TABLE
(
    PK integer PRIMARY KEY
);

Scalar function

You could try various combinations of INNER LOOP JOIN (with the written join order reversed, and FORCE ORDER added to suppress the warning), FORCESEEK(index_name(columns)) (without schemabinding) and even QUERYTRACEON(8690) to prevent a performance spool; but in my experience, FAST 1 is the most reliable way to get the right plan, and simpler as well.

CREATE FUNCTION dbo.GetXML
(
    @Lines dbo.PrimaryKeys READONLY
)
RETURNS xml
WITH SCHEMABINDING -- If possible
AS
BEGIN
    DECLARE @x xml;

    WITH XMLNAMESPACES (N'important namespace' AS pro)
    SELECT @x =
    (
        SELECT
            SOH.SalesOrderID,
            SOH.OrderDate,
            SOH.DueDate,
            SOH.ShipDate,
            SOH.[Status],
            SOH.PurchaseOrderNumber,
            SOH.AccountNumber,
            SOH.CustomerID,
            SOH.TotalDue
        FROM @Lines AS L
        JOIN Sales.SalesOrderHeader AS SOH
            ON SOH.SalesOrderID = L.PK
        ORDER BY
            SOH.SalesOrderID
        FOR XML PATH(N'pro:Import'), TYPE
    )
    OPTION (FAST 1);

    RETURN @x;
END;

In addition to all the usual reasons to avoid scalar functions, they do make plan analysis harder. First, you need to capture actual plans using Profiler or Extended Events - they do not appear in query tools like SSMS. Second, not all hints work as you would expect, for example RECOMPILE due to scoping and the separate plans.

DECLARE @Lines dbo.PrimaryKeys;

INSERT @Lines (PK)
SELECT TOP (300) SOH.SalesOrderID
FROM Sales.SalesOrderHeader AS SOH;

SELECT dbo.GetXML(@Lines);

The code above reliably produces the desired plan for me regardless of the number of rows in the table variable.

Without the FAST 1 hint in the function, the optimizer chooses a merge join and scan of both tables. This plan is cached and reused for a later function execution where TOP (300)is changed to TOP (1), reproducing the core of the issue.

Inline table-valued function

Unless there are absolutely compelling reasons to stick with the scalar function, you should probably rewrite it to inline table-valued form as follows:

CREATE FUNCTION dbo.GetXMLTable
(
    @Lines dbo.PrimaryKeys READONLY
)
RETURNS TABLE
AS
RETURN
    WITH XMLNAMESPACES (N'important namespace' AS pro)
    SELECT
        X.xml_result 
    FROM 
    (
        SELECT
            SOH.SalesOrderID,
            SOH.OrderDate,
            SOH.DueDate,
            SOH.ShipDate,
            SOH.[Status],
            SOH.PurchaseOrderNumber,
            SOH.AccountNumber,
            SOH.CustomerID,
            SOH.TotalDue
        FROM @Lines AS L
        CROSS APPLY
        (
            SELECT TOP (1) SOH.*
            FROM Sales.SalesOrderHeader AS SOH
                WITH (FORCESEEK(PK_SalesOrderHeader_SalesOrderID(SalesOrderID)))
            WHERE
                SOH.SalesOrderID = L.PK
            ORDER BY
                SOH.SalesOrderID
        ) AS SOH
        ORDER BY
            SOH.SalesOrderID
        FOR XML PATH(N'pro:Import'), TYPE
    ) AS X (xml_result);

The TOP (1) in the cross apply prevents the optimizer choosing anything other than nested loops join. The FORCESEEK hint is not required; I have included it to illustrate how it could be used if necessary.

DECLARE @Lines dbo.PrimaryKeys;

INSERT @Lines (PK)
SELECT TOP (300) SOH.SalesOrderID
FROM Sales.SalesOrderHeader AS SOH;

SELECT GXT.xml_result 
FROM dbo.GetXMLTable(@Lines) AS GXT;

The actual execution plan (without using Profiler/XE) is: