Sql-server – Does SQL Server cache the result of a multi-statement table-valued function

execution-planset-returning-functionssql server

A multi-statement table-valued function returns its result in a table variable.

Are these results ever reused, or is the function always fully evaluated every time it is called?

Best Answer

The results of a multi-statement table-valued function (msTVF) are never cached or reused across statements (or connections), but there are a couple of ways that an msTVF result may be reused within the same statement. To that extent, an msTVF is not necessarily repopulated each time it is called.

Example msTVF

This (deliberately inefficient) msTVF returns a specified range of integers, with a timestamp on each row:

IF OBJECT_ID(N'dbo.IntegerRange', 'TF') IS NOT NULL
    DROP FUNCTION dbo.IntegerRange;
GO
CREATE FUNCTION dbo.IntegerRange (@From integer, @To integer)
RETURNS @T table 
(
    n integer PRIMARY KEY, 
    ts datetime DEFAULT CURRENT_TIMESTAMP
)
WITH SCHEMABINDING
AS
BEGIN
    WHILE @From <= @To
    BEGIN
        INSERT @T (n)
        VALUES (@From);

        SET @From = @From + 1;
    END;
    RETURN;
END;

Static table variable

If all the parameters to the function call are constants (or runtime constants), the execution plan will populate the table variable result once. The remainder of the plan may access the table variable many times. The static nature of the table variable can be recognised from the execution plan. For example:

SELECT
    IR.n,
    IR.ts 
FROM dbo.IntegerRange(1, 5) AS IR
ORDER BY
    IR.n;

Returns a result similar to:

The execution plan is:

The Sequence operator first calls the Table Valued Function operator, which populates the table variable (note this operator returns no rows). Next, the Sequence calls its second input, which returns the contents of the table variable (using a Clustered Index Scan in this case).

The giveaway that the plan is using a 'static' table variable result is the Table Valued Function operator below a Sequence - the table variable needs to be fully populated once before the remainder of the plan can get going.

Multiple accesses

To show the table variable result being accessed more than once, we will use a second table with rows numbered from 1 to 5:

IF OBJECT_ID(N'dbo.T', 'U') IS NOT NULL
    DROP TABLE dbo.T;

CREATE TABLE dbo.T (i integer NOT NULL);

INSERT dbo.T (i) 
VALUES (1), (2), (3), (4), (5);

And a new query that joins this table to our function (this could equally be written as an APPLY):

SELECT T.i,
       IR.n,
       IR.ts
FROM dbo.T AS T
JOIN dbo.IntegerRange(1, 5) AS IR
    ON IR.n = T.i;

The result is:

The execution plan:

As before, the Sequence populates the table variable msTVF result first. Next, nested loops is used to join each row from table T to a row from the msTVF result. Since the function definition included a helpful index on the table variable, an index seek can be used.

The key point is that when the parameters to the msTVF are constants (including variables & parameters) or treated as runtime constants for the statement by the execution engine, the plan will feature two separate operators for the msTVF table variable result: one to populate the table; another to access the results, possibly accessing the table multiple times, and possibly making use of indexes declared in the function definition.

Correlated parameters and non-constant parameters

To highlight the differences when correlated parameters (outer references) or non-constant function parameters are used, we will change the contents of table T so the function has much more work to do:

TRUNCATE TABLE dbo.T;

INSERT dbo.T (i) 
VALUES (50001), (50002), (50003), (50004), (50005);

The following modified query now uses an outer reference to table T in one of the function parameters:

SELECT T.i,
       IR.n,
       IR.ts
FROM dbo.T AS T
CROSS APPLY dbo.IntegerRange(1, T.i) AS IR
WHERE IR.n = T.i;

This query takes around 8 seconds to return results like:

Notice the time difference between rows in column ts. The WHERE clause limits the final result for a sensibly-sized output, but the inefficient function still takes a while to populate the table variable with 50,000-odd rows (depending on the correlated value of i from table T).

The execution plan is:

Notice the lack of a Sequence operator. Now, there is a single Table Valued Function operator that populates the table variable and returns its rows on each iteration of the nested loops join.

To be clear: with just 5 rows in table T, the Table Valued Function operator runs 5 times. It generates 50,001 rows on the first iteration, 50,002 on the second...and so on. The table variable is 'thrown away' (truncated) between iterations, so each of the five calls is a full population. This is why it is so slow, and each row takes about the same time to appear in the result.

Side notes:

Naturally, the scenario above is deliberately contrived to show how poor performance can be when the msTVF populates many rows on each iteration.

A sensible implementation of the above code would set both msTVF parameters to i, and remove the redundant WHERE clause. The table variable would still be truncated and repopulated on each iteration, but only with one row each time.

We could also fetch the minimum and maximum i values from T and store them in variables in a prior step. Calling the function with variables instead of correlated parameters would allow the 'static' table variable pattern to be used as noted earlier.

Caching for unchanged correlated parameters

Returning to address the original question once more, where the Sequence static pattern cannot be used, SQL Server can avoid truncating and repopulating the msTVF table variable if none of the correlated parameters have changed since the prior iteration of a nested loop join.

To demonstrate this, we will replace the contents of T with five identical i values:

TRUNCATE TABLE dbo.T;

INSERT dbo.T (i) 
VALUES (50005), (50005), (50005), (50005), (50005);

The query with a correlated parameter again:

SELECT T.i,
       IR.n,
       IR.ts
FROM dbo.T AS T
CROSS APPLY dbo.IntegerRange(1, T.i) AS IR
WHERE IR.n = T.i;

This time the results appear in around 1.5 seconds:

Note the identical timestamps on each row. The cached result in the table variable is reused for subsequent iterations where the correlated value i is unchanged. Reusing the result is much faster than inserting 50,005 rows each time.

The execution plan looks very similar to before:

The key difference is in the Actual Rebinds and Actual Rewinds properties of the Table Valued Function operator:

When the correlated parameters do not change, SQL Server can replay (rewind) the current results in the table variable. When the correlation changes, SQL Server must truncate and repopulate the table variable (rebind). The one rebind happens on the first iteration; the four subsequent iterations are all rewinds since the value of T.i is unchanged.

Option 1

Use multiple APPLY clauses with a startup condition for each using an outer reference from a previous apply in the chain. The efficiency of this method depends on the presence of startup filters in the execution plan. Correct results are guaranteed, but plan shape is not.

Option 2

Add an extra column with a constant literal to each clause of the union e.g. [Priority] = 1 then add an ORDER BY [Priority] ASC at the TOP (1) scope. Efficient operation depends on the plan avoiding sorts.

On reflection, this is not what you want in this case, because one row from each option is required by the merge concatenation in the plan. Nevertheless, it is an option in more general situations (where the alternate inputs produce more than one row, and the first row at low cost).

In addition:

Option 3

Since you're only returning a single row, you could use a multi-statement table-valued function instead, with explicit logic to try each option in order (in separate queries), returning as soon as the first result is found. This is guaranteed to produce correct results efficiently.

Note

The current function is tecnhically nondeterministic; SQL Server could evaluate the union all in any order it chooses, potentially returning a lower-priority result before evaluating a higher priority one.

Best Answer

Example msTVF

Static table variable

Multiple accesses

Correlated parameters and non-constant parameters

Caching for unchanged correlated parameters

Related Solutions

Sql-server – Does WITH SCHEMABINDING on a multi-statement TVF improve cardinality estimates

Sql-server – UNION ALL or other ways to return first row of result-set

Option 1

Option 2

Option 3

Related Question