Sql-server – How to force a scalar UDF to be evaluated just once in a query

functionsperformancequery-performancesql server

I have a query which needs to filter against the result of a scalar UDF. The query must be sent as a single statement (so I can't assign the UDF result to a local variable) and I cannot use a TVF. I am aware of the performance issues caused by scalar UDFs, which include forcing the entire plan to be run serially, excessive memory grants, cardinality estimation problems, and lack of inlining. For this question please assume that I need to use a scalar UDF.

The UDF itself is pretty expensive to call but in theory the queries can be logically implemented by the optimizer in such a way that the function only needs to be calculated once. I have mocked up a greatly simplified example for this question. The following query takes 6152 ms to execute on my machine:

SELECT x1.ID
FROM dbo.X_100_INTEGERS x1
WHERE x1.ID >= dbo.EXPENSIVE_UDF();

The filter operator in the query plan suggests that the function was evaluated once for each row:

query plan 1

DDL and data prep:

CREATE OR ALTER FUNCTION dbo.EXPENSIVE_UDF () RETURNS INT
AS
BEGIN
    DECLARE @tbl TABLE (VAL VARCHAR(5));

    -- make the function expensive to call
    INSERT INTO @tbl
    SELECT [VALUE]
    FROM STRING_SPLIT(REPLICATE(CAST('Z ' AS VARCHAR(MAX)), 20000), ' ');

    RETURN 1;
END;

GO

DROP TABLE IF EXISTS dbo.X_100_INTEGERS;

CREATE TABLE dbo.X_100_INTEGERS (ID INT NOT NULL);

-- insert 100 integers from 1 - 100
WITH
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.X_100_INTEGERS WITH (TABLOCK)
SELECT n FROM Nums WHERE n <= 100;

Here is a db fiddle link for the above example, although the code takes about 18 seconds to execute there.

In some cases I may not be able to edit the function's code because it is supplied by a vendor. In other cases I am able to make changes. How can I force a scalar UDF to be evaluated just once in a query?

Best Answer

Ultimately, it is not possible to force SQL Server to evaluate a scalar UDF just once in a query. However, there are some steps which can be taken to encourage it. With testing I believe that you can get something that works with the current version of SQL Server, but it's possible that future changes will require you to revisit your code.

If it's possible to edit the code a good first thing to try is to make the function deterministic if possible. Paul White points out here that the function must be created with the SCHEMABINDING option and the function code itself must be deterministic.

After making the following change:

CREATE OR ALTER FUNCTION dbo.EXPENSIVE_UDF () RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @tbl TABLE (VAL VARCHAR(5));

    -- make the function expensive to call
    INSERT INTO @tbl
    SELECT [VALUE]
    FROM STRING_SPLIT(REPLICATE(CAST('Z ' AS VARCHAR(MAX)), 20000), ' ');

    RETURN 1;
END;

The query from the question is executed in 64 ms:

SELECT x1.ID
FROM dbo.X_100_INTEGERS x1
WHERE x1.ID >= dbo.EXPENSIVE_UDF();

The query plan no longer has the filter operator:

query plan 1

To be sure that it executed only once we can use the new sys.dm_exec_function_stats DMV released in SQL Server 2016:

SELECT execution_count
FROM sys.dm_exec_function_stats
WHERE object_id = OBJECT_ID('EXPENSIVE_UDF', 'FN');

Issuing an ALTER against the function will reset the execution_count for that object. The above query returns 1 which means the function was only executed once.

Note that just because the function is deterministic does not mean that it will be evaluated only once for any query. In fact, for some queries adding SCHEMABINDING can degrade performance. Consider the following query:

WITH cte (UDF_VALUE) AS
(
    SELECT DISTINCT dbo.EXPENSIVE_UDF() UDF_VALUE
)
SELECT ID
FROM dbo.X_100_INTEGERS
INNER JOIN cte ON ID >= cte.UDF_VALUE;

The superfluous DISTINCT was added to get rid of a Filter operator. The plan looks promising:

query plan 2

Based on that, one would expect the UDF to be evaluated once and to be used as the outer table in the nested loop join. However, the query takes 6446 ms to run on my machine. According to sys.dm_exec_function_stats the function was executed 100 times. How that is that possible? In "Compute Scalars, Expressions and Execution Plan Performance", Paul White points out that the Compute Scalar operator can be deferred:

More often than not, a Compute Scalar simply defines an expression; the actual computation is deferred until something later in the execution plan needs the result.

For this query it looks like the UDF call was deferred until it was needed, at which point it was evaluated 100 times.

Interestingly, the CTE example executes in 71 ms on my machine when the UDF is not defined with SCHEMABINDING, as in the original question. The function is only executed once when the query is run. Here is the query plan for that:

query plan 3

It's not clear why the Compute Scalar isn't deferred. It could be because the nondeterminism of the function limits the rearranging of operators that the query optimizer can do.

An alternative approach is to add a small table to the CTE and to query the only row in that table. Any small table will do, but let's use the following:

CREATE TABLE dbo.X_ONE_ROW_TABLE (ID INT NOT NULL);

INSERT INTO dbo.X_ONE_ROW_TABLE VALUES (1);

The query then becomes:

WITH cte (UDF_VALUE) AS
(       
    SELECT DISTINCT dbo.EXPENSIVE_UDF() UDF_VALUE
    FROM dbo.X_ONE_ROW_TABLE
)
SELECT ID
FROM dbo.X_100_INTEGERS
INNER JOIN cte ON ID >= cte.UDF_VALUE;

The addition of the dbo.X_ONE_ROW_TABLE adds uncertainty for the optimizer. If the table has zero rows then the CTE will return 0 rows. In any case, the optimizer cannot guarantee that the CTE will return one row if the UDF is not deterministic, so it seems likely that the UDF will be evaluated before the join. I would expect the optimizer to scan dbo.X_ONE_ROW_TABLE, use a stream aggregate to get the maximum value of the one row returned (which requires the function to be evaluated), and to use that as the outer table for a nested loop join to dbo.X_100_INTEGERS in the main query. This appears to be what happens:

query plan 4

The query executes in about 110 ms on my machine and the UDF is evaluated only once according to sys.dm_exec_function_stats. It would be incorrect to say that the query optimizer is forced to evaluate the UDF only once. However, it is hard to imagine an optimizer rewrite that would lead to a lower cost query, even with the limitations around UDF and compute scalar costing.

In summary, for deterministic functions (which must include the SCHEMABINDING option) try writing the query in as simple of a way as possible. If on SQL Server 2016 or a later version, confirm that the function was only executed once using sys.dm_exec_function_stats. Execution plans can be misleading in that regard.

For functions not considered by SQL Server to be deterministic, including anything lacking the SCHEMABINDING option, one approach is to put the UDF in a carefully crafted CTE or derived table. This requires a little care but the same CTE can work for both deterministic and nondeterministic functions.