Sql-server – Forcing correct execution plan when passing a table UDT to a table function

execution-planperformancequery-performancesql serversql-server-2012table-valued-parameters

I have a scalar function that returns a big XML, created over a bunch of invoices.

The exact list of invoices to feed to the function can be calculated using several different ways, but the function is same every time. For this reason, I declared a user-defined table type to contain primary keys from the eInvoice.Header table and pass it to the function. This way I can have several different functions to decide which invoices to process, and just one function that actually generates XML:

create function eInvoice.GetRelevantLinesInOneWay()
returns table ...

create function eInvoice.GetRelevantLinesInAnotherWay()
returns table ...

create function eInvoice.GetXML(@lines eInvoice.InvoicePrimaryKeys readonly)
returns xml
as
begin
    declare @x xml;

    with xmlnamespaces(N'important namespace' as pro)
    select @x = (
        select
            ...
        from
            eInvoice.Header h
            inner join @lines l on h.ST_PRIMARY = l.invoice_row_id
        for xml path(N'pro:Import'), type
    );

    return @x;
end;

Unfortunately this setup has proven to be very fragile.

Typically @lines contains about 150 rows (out of about 1m in eInvoce.Header). The correct execution plan is to use index seek on ST_PRIMARY, and that is what always happens when I execute the body of eInvoice.GetXML as an ad-hoc query.

However when I store it as a function, it works as expected for some little time, then something happens (too many rows in @lines, like about 300?), and it decides to change the execution plan to a full scan on eInvoice.Header and to keep it that way.

With the seek plan the function executes instantly, with the scan plan I don't know how much time it takes, I waited for 30 minutes and cancelled it.

I've tried various ways of forcing the seek plan on the function.

  • Adding with (forceseek) after eInvoce.Header seems to work when I click "Display estimated execution plan," but when I actually execute it, the hint is ignored and a scan is performed.
  • Adding option (recompile) inside the function or outside, in the calling code, does not seem to have an effect.
  • Capturing the execution plan and hardcoding it inside the function with option (use plan N'<plan>') does not work because @lines contains a primary key on its only column, and the name of that index is different for each instance of @lines, but the execution plan has to refer to the index by fixed name.
  • I tried deleting cached execution plans and recompiling the function several times. It sometimes helps for a very limited time and sometimes has no effect.

Is there a way to force the index seek?

Best Answer

Summary

The question does not provide execution plans or a full reproduction script, but based on the information given, you should use a FAST 1 hint. If you can, you should also consider converting the scalar function to the inline table-valued type.

AdventureWorks-based examples

Download the Microsoft sample database here.

Table type

CREATE TYPE dbo.PrimaryKeys AS TABLE
(
    PK integer PRIMARY KEY
);

Scalar function

You could try various combinations of INNER LOOP JOIN (with the written join order reversed, and FORCE ORDER added to suppress the warning), FORCESEEK(index_name(columns)) (without schemabinding) and even QUERYTRACEON(8690) to prevent a performance spool; but in my experience, FAST 1 is the most reliable way to get the right plan, and simpler as well.

CREATE FUNCTION dbo.GetXML
(
    @Lines dbo.PrimaryKeys READONLY
)
RETURNS xml
WITH SCHEMABINDING -- If possible
AS
BEGIN
    DECLARE @x xml;

    WITH XMLNAMESPACES (N'important namespace' AS pro)
    SELECT @x =
    (
        SELECT
            SOH.SalesOrderID,
            SOH.OrderDate,
            SOH.DueDate,
            SOH.ShipDate,
            SOH.[Status],
            SOH.PurchaseOrderNumber,
            SOH.AccountNumber,
            SOH.CustomerID,
            SOH.TotalDue
        FROM @Lines AS L
        JOIN Sales.SalesOrderHeader AS SOH
            ON SOH.SalesOrderID = L.PK
        ORDER BY
            SOH.SalesOrderID
        FOR XML PATH(N'pro:Import'), TYPE
    )
    OPTION (FAST 1);

    RETURN @x;
END;

In addition to all the usual reasons to avoid scalar functions, they do make plan analysis harder. First, you need to capture actual plans using Profiler or Extended Events - they do not appear in query tools like SSMS. Second, not all hints work as you would expect, for example RECOMPILE due to scoping and the separate plans.

DECLARE @Lines dbo.PrimaryKeys;

INSERT @Lines (PK)
SELECT TOP (300) SOH.SalesOrderID
FROM Sales.SalesOrderHeader AS SOH;

SELECT dbo.GetXML(@Lines);

The code above reliably produces the desired plan for me regardless of the number of rows in the table variable.

Without the FAST 1 hint in the function, the optimizer chooses a merge join and scan of both tables. This plan is cached and reused for a later function execution where TOP (300)is changed to TOP (1), reproducing the core of the issue.

Inline table-valued function

Unless there are absolutely compelling reasons to stick with the scalar function, you should probably rewrite it to inline table-valued form as follows:

CREATE FUNCTION dbo.GetXMLTable
(
    @Lines dbo.PrimaryKeys READONLY
)
RETURNS TABLE
AS
RETURN
    WITH XMLNAMESPACES (N'important namespace' AS pro)
    SELECT
        X.xml_result 
    FROM 
    (
        SELECT
            SOH.SalesOrderID,
            SOH.OrderDate,
            SOH.DueDate,
            SOH.ShipDate,
            SOH.[Status],
            SOH.PurchaseOrderNumber,
            SOH.AccountNumber,
            SOH.CustomerID,
            SOH.TotalDue
        FROM @Lines AS L
        CROSS APPLY
        (
            SELECT TOP (1) SOH.*
            FROM Sales.SalesOrderHeader AS SOH
                WITH (FORCESEEK(PK_SalesOrderHeader_SalesOrderID(SalesOrderID)))
            WHERE
                SOH.SalesOrderID = L.PK
            ORDER BY
                SOH.SalesOrderID
        ) AS SOH
        ORDER BY
            SOH.SalesOrderID
        FOR XML PATH(N'pro:Import'), TYPE
    ) AS X (xml_result);

The TOP (1) in the cross apply prevents the optimizer choosing anything other than nested loops join. The FORCESEEK hint is not required; I have included it to illustrate how it could be used if necessary.

DECLARE @Lines dbo.PrimaryKeys;

INSERT @Lines (PK)
SELECT TOP (300) SOH.SalesOrderID
FROM Sales.SalesOrderHeader AS SOH;

SELECT GXT.xml_result 
FROM dbo.GetXMLTable(@Lines) AS GXT;

The actual execution plan (without using Profiler/XE) is:

iTVF Execution Plan