Sql-server – Efficiency of Scalar UDF vs TVF

functionsoptimizationset-returning-functionssql server

I am trying to optimize the rollup code for my company and ran into a very peculiar issue. I converted many Scalar functions to be TVFs and they all seem to run more quickly than the original, which is great. However, in the queries that call them, they end up running significantly slower than the original. Here is a basic outline of my update:

SELECT col1, ..., colx,
    (CASE WHEN x <= 0 OR y <= 0 OR z <= 0 OR z = x 
          THEN output
        WHEN valX <= 0 
          THEN output
        WHEN minimum.min < 1.0 THEN 1.0
        ELSE minimum.min
        END) AS Q,
FROM Tbl1...tblx (series of inner joins)
CROSS APPLY dbo.inlinemin(val1, val2) AS minimum

This is a basic outline of the original:

SELECT col1, ..., colx,
    (CASE WHEN x <= 0 OR y <= 0 OR z <= 0 OR z = x 
          THEN output
        WHEN valX <= 0 
          THEN output
        ELSE maximum(minimum(val1,val2),1.0)
        END) AS Q,
FROM Tbl1...tblx (series of inner joins)

The numbers are more or less the same, as is the logic. The only difference is my function 'inlineMin' is a TVF as opposed to 'maximum' and 'minimum' the original Scalar functions. These functions are exceedingly simple and just return the max or min between the two passed parameters. Even the execution plan is more or less the same. There is a change from merge join to hash match at one point, however, the cost of this difference is minimal and could not account for the drastic change in elapsed time and cpu time.

When I run the functions outside of the rollup query my function is faster than the original for large sets of data. This makes sense given how TVFs work in comparison to scalar UDFs. However, when I call them in the query, my updated version runs roughly 6x slower. The cross apply is (seemingly) not the issue since leaving the cross apply and simply using the old functions

SELECT ...
    ELSE maximum(minimum(val1,val2),1.0)
    END) AS Q,
FROM Tbl1...tblx (series of inner joins)
CROSS APPLY inlinemin(val1, val2) AS NotUsedHere

Is roughly as efficient as the original code. It is only when I include the output of my function in the select that the query becomes significantly slower.

As I understand, the function is called and runs at the cross apply, meaning that it should be calculating a value even if it is not in the select, so why would it be faster not to include it in the select? Further, if the above is false, why would my function itself be faster but run significantly slower when used inside of a query?

Edit:

Here is the Inline TVF that I have written to replace the original

CREATE FUNCTION [dbo].[InlineMin](@val1 FLOAT, @val2 FLOAT)
RETURNS TABLE WITH SCHEMABINDING 
AS
RETURN
    SELECT minVal =
    CASE    WHEN @val1 < @val2
            THEN @val1
    ELSE
            ISNULL(@val2,@val1)
END

Here is the anonymized query plan for my rewrite:
https://www.brentozar.com/pastetheplan/?id=ryKR_Q0Em

and the anonymized query plan for the original:
https://www.brentozar.com/pastetheplan/?id=SJsS1-A4X

Best Answer

As I understand, the function is called and runs at the cross apply, meaning that it should be calculating a value even if it is not in the select, so why would it be faster not to include it in the select?

The optimizer is very good at removing subtrees that compute expressions that are not needed in the final result (top-level projection). When you remove the value from the select list, the work needed to compute that value is simply not done.

Further, if the above is false, why would my function itself be faster but run significantly slower when used inside of a query?

This is difficult to assess in detail from an anonymized plan. Nevertheless, removing the scalar T-SQL functions allows the optimizer to consider parallel plans. You might like to test your rewritten query with an OPTION (MAXDOP 1) query hint to see how the serial plan selected compares with your original.

Parallel plans are not always better (though they are only selected if they appear to be lower cost to the optimizer). Your case has a relatively low-cost, so the optimizer does not believe it is worth exploring a tremendous number of alternatives. There are cases where the time spent considering both serial and parallel plans is counter-productive to final plan quality.

I do apologise if this is a little vague, but anonymized plans really do make it tough to be specific. All things being equal, in-line functions will currently out-perform scalar functions. Sadly, all things are rarely equal.