SQL Server – Self Referencing Scalar Function Nesting Level Exceeded

functionssql serversql-server-2017

Purpose

When trying to create a test example of a self referencing function, one version fails while another one succeeds.

The only difference being an added SELECT to the function body resulting in a different execution plan for both.

The function that works

CREATE FUNCTION dbo.test5(@i int)
RETURNS INT
AS 
BEGIN
RETURN(
SELECT TOP 1
CASE 
WHEN @i = 1 THEN 1
WHEN @i = 2 THEN 2
WHEN @i = 3 THEN  dbo.test5(1) + dbo.test5(2)
END
)
END;

Calling the function

SELECT dbo.test5(3);

Returns

(No column name)
3

The function that does not work

CREATE FUNCTION dbo.test6(@i int)
RETURNS INT
AS 
BEGIN
RETURN(
SELECT TOP 1
CASE 
WHEN @i = 1 THEN 1
WHEN @i = 2 THEN 2
WHEN @i = 3 THEN (SELECT dbo.test6(1) + dbo.test6(2))
END
)END;

Calling the function

SELECT dbo.test6(3);

SELECT dbo.test6(2);

Results in the error

Maximum stored procedure, function, trigger, or view nesting level
exceeded (limit 32).

Guessing the cause

There is an additional compute scalar on the estimated plan of the failed function, calling

<ColumnReference Column="Expr1002" />
<ScalarOperator ScalarString="CASE WHEN [@i]=(1) THEN (1) ELSE CASE WHEN [@i]=(2) THEN (2) ELSE CASE WHEN [@i]=(3) THEN [Expr1000] ELSE NULL END END END">

And expr1000 being

<ColumnReference Column="Expr1000" />
<ScalarOperator ScalarString="[dbo].[test6]((1))+[dbo].[test6]((2))">

Which could explain the recursive references exceeding 32.

The actual question

The added SELECT makes the function call itself over and over, resulting in an endless loop, but why is adding a SELECT giving this result?

Additional info

Estimated execution plans

DB<>Fiddle

Build version:
14.0.3045.24

Tested on compatibility_levels 100 and 140

Best Answer

This is a bug in project normalization, exposed by using a subquery inside a case expression with a non-deterministic function.

To explain, we need to note two things up front:

SQL Server cannot execute subqueries directly, so they are always unrolled or converted to an apply.
The semantics of CASE are such that a THEN expression should only be evaluated if the WHEN clause returns true.

The (trivial) subquery introduced in the problematic case therefore results in an apply operator (nested loops join). To meet the second requirement, SQL Server initially places the expression dbo.test6(1) + dbo.test6(2) on the inner side of the apply:

[Expr1000] = Scalar Operator([dbo].[test6]((1))+[dbo].[test6]((2)))

...with the CASE semantics honoured by a pass-through predicate on the join:

[@i]=(1) OR [@i]=(2) OR IsFalseOrNull [@i]=(3)

The inner side of the loop is only evaluated if the pass-through condition evaluates to false (meaning @i = 3). This is all correct so far. The Compute Scalar following the nested loops join also honours the CASE semantics correctly:

[Expr1001] = Scalar Operator(CASE WHEN [@i]=(1) THEN (1) ELSE CASE WHEN [@i]=(2) THEN (2) ELSE CASE WHEN [@i]=(3) THEN [Expr1000] ELSE NULL END END END)

The problem is that the project normalization stage of query compilation sees that Expr1000 is uncorrelated and determines that it would be safe (narrator: it isn't) to move it outside the loop:

[Expr1000] = Scalar Operator([dbo].[test6]((1))+[dbo].[test6]((2)))

This breaks* the semantics implemented by the pass-through predicate, so the function is evaluated when it should not be, and an infinite loop results.

You should report this bug. A workaround is to prevent the expression being moved outside the apply by making it correlated (i.e. including @i in the expression) but this is a hack of course. There is a way to disable project normalization, but I have been asked before not to share it publicly, so I won't.

This problem does not arise in SQL Server 2019 when the scalar function is inlined, because the inlining logic operates directly on the parsed tree (well before project normalization). The simple logic in the question can be simplified by the inlining logic to the non-recursive:

[Expr1019] = (Scalar Operator((1)))
[Expr1045] = Scalar Operator(CONVERT_IMPLICIT(int,CONVERT_IMPLICIT(int,[Expr1019],0)+(2),0))

...which returns 3.

Another way to illustrate the core issue is:

-- Not schema bound to make it non-det
CREATE OR ALTER FUNCTION dbo.Error() 
RETURNS integer 
-- WITH INLINE = OFF -- SQL Server 2019 only
AS
BEGIN
    RETURN 1/0;
END;
GO
DECLARE @i integer = 1;

SELECT
    CASE 
        WHEN @i = 1 THEN 1
        WHEN @i = 2 THEN 2
        WHEN @i = 3 THEN (SELECT dbo.Error()) -- 'subquery'
        ELSE NULL
    END;

Reproduces on the latest builds of all versions from 2008 R2 to 2019 CTP 3.0.

A further example (without a scalar function) provided by Martin Smith:

SELECT IIF(@@TRANCOUNT >= 0, 1, (SELECT CRYPT_GEN_RANDOM(4)/ 0))

This has all the key elements needed:

CASE (implemented internally as ScaOp_IIF)
A non-deterministic function (CRYPT_GEN_RANDOM)
A subquery on the branch that should not be executed ((SELECT ...))

*Strictly, the above transformation could still be correct if evaluation of Expr1000 was deferred correctly, since it is referenced only by the safe construction:

[Expr1002] = Scalar Operator(CASE WHEN [@i]=(1) THEN (1) ELSE CASE WHEN [@i]=(2) THEN (2) ELSE CASE WHEN [@i]=(3) THEN [Expr1000] ELSE NULL END END END)

...but this requires an internal ForceOrder flag (not query hint), which is not set either. In any case, the implementation of the logic applied by project normalization is incorrect or incomplete.

Bug report on the Azure Feedback site for SQL Server.

Related Solutions

Sql-server – Does WITH SCHEMABINDING on a multi-statement TVF improve cardinality estimates

In my tests, no, adding WITH SCHEMABINDING does not improve cardinality estimates. I created a simple table:

CREATE TABLE dbo.myobjects(id INT PRIMARY KEY);

INSERT dbo.myobjects SELECT [object_id] FROM sys.all_objects;

Then two functions:

CREATE FUNCTION dbo.noschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
AS
BEGIN
  INSERT @x SELECT id FROM dbo.myobjects;

  RETURN;
END
GO

CREATE FUNCTION dbo.withschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
WITH SCHEMABINDING
AS
BEGIN
  INSERT @x SELECT id FROM dbo.myobjects;

  RETURN;
END
GO

Comparing the actual plans, both show estimated rows = 1, actual rows = 2112 (this latter number may differ on your system depending on version/SP etc).

Comparing the speed:

SET NOCOUNT ON;
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.noschemabinding(1);
DROP TABLE #x;
GO 1000
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.withschemabinding(1);
DROP TABLE #x;
GO 1000
SELECT SYSDATETIME();

Results:

                    run 1               run 2
----------------    ------------------  ------------------
No schemabinding    14632 milliseconds  14079 milliseconds
Schemabinding       14251 milliseconds  13979 milliseconds

So, does it matter much? Nope.

SCHEMABINDING in this case is used for a more important goal: underlying schema stability. You will probably have much better optimization opportunities if you pursue converting your function to an inline TVF than to chase down obscure plan-affecting differences in a multi-statement TVF.

Sql-server – Oracle GoldenGate add trandata errors

I found out what the problem is, it seems that GoldenGate doesn't work with SQL Express. The server I was connecting to is SQL Express, I'll need to use the Enterprise Edition.

Best Answer

Related Solutions

Sql-server – Does WITH SCHEMABINDING on a multi-statement TVF improve cardinality estimates

Sql-server – Oracle GoldenGate add trandata errors

Related Question