Sql-server – Scalar function value not being stored

functionsoptimizationsql server

I created a scalar valued function that pauses for as many seconds as passed in. This function won't be used in production. I am trying to test how SQL server evaluates and creates plans if I add a UDF to a query. I am very puzzled by the difference in how it evaluates the last of my 3 queries. First here is the code to create the test function:

CREATE FUNCTION dbo.[sleep](@seconds int)
RETURNS datetime
as
BEGIN

DECLARE @sleepUntil datetime
DECLARE @dummy int
SET @sleepUntil = DATEADD(s, @seconds, getdate())
WHILE getdate() < @sleepUntil
    SET @dummy = 0
RETURN getdate()
END

Now to create a table to test queries

CREATE TABLE #table (num int)

and to insert values

INSERT INTO #table 
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)

The first query takes 2 seconds to run – which makes sense assuming that SQL server stores the value returned from the scalar function.

   SELECT num
   FROM #table t
   WHERE dbo.sleep(2)=0 

The second one still takes 2 seconds – so my assumption is that it does the same.

 SELECT num
   FROM #table t
   WHERE EXISTS (select TOP 1 1 
                FROM  #table t1 ) 
   OR dbo.sleep(2)=0 

The third one is puzzling – it takes 20 seconds to run, why would sql server not store the value here instead it runs it for each row? An interesting point I noticed was that when I clicked to see live query statistics it took 20 seconds to start showing the plan.

SELECT num
   FROM #table t
   WHERE EXISTS (select TOP 1 1 
                FROM  #table t1 
                WHERE t.num = t1.num) dbo.sleep(2)=0 

Best Answer

To answer a question like this you need to carefully inspect the plans. I asked and answered a similar question here. It's probably worth reading through, but the most important parts are that compute scalars can be deferred and SQL Server 2016 gives us the sys.dm_exec_function_stats DMV so that it's easier to see how many times a UDF executes in a query.

Your example UDF is nondeterministic which can change how many times it's executed in a query. The use of getdate() and the lack of SCHEMABINDING makes it nondeterministic. You can verify this within SQL Server as well:

SELECT OBJECTPROPERTY(OBJECT_ID('dbo.[sleep]'), 'IsDeterministic')

Let's go through your queries. For the first query the UDF call is in a startup expression predicate:

first query startup

The filter operator is executed once over 9 rows. The startup expression predicate means that the UDF is only executed once instead of one time for every row processed by the filter. That's why this query only takes 2 seconds.

If I change the query to the following:

SELECT num
FROM #table t
WHERE dbo.sleep(2)=DATEADD(DAY, num, 0);

Now the query takes 18 seconds. In the plan we can see that the filter is no longer a startup expression predicate:

query 1 change

Now the UDF is executed once per row. If I create a completely new function that's deterministic:

CREATE FUNCTION dbo.[no_sleep] (@seconds int)
RETURNS datetime
WITH SCHEMABINDING
as
BEGIN
    RETURN DATEADD(DAY, 0, 0);
END;

Now the query plan changes again:

query plan deter

For this query, the UDF is only executed a single time. You can verify that through the UDF, by using extended events, or by making the function execution take a long time (but it needs to remain deterministic).

For the second query in the question:

SELECT num
   FROM #table t
   WHERE EXISTS (select TOP 1 1 
                FROM  #table t1 ) 
   OR dbo.sleep(2)=0;

The UDF is only executed per query for a different reason. The results of the WHERE clause do not reference table t and SQL Server chooses to effectively cache them into a spool:

second query

If you look at the actual plan you can see that the spool was executed 9 times but the filter was only executed once over a single row.

Your third query doesn't compile, but I assume that you meant this:

SELECT num
   FROM #table t
   WHERE EXISTS (select TOP 1 1 
                FROM  #table t1 
                WHERE t.num = t1.num) OR dbo.sleep(2)=0; 

Now that you reference a column from t in the WHERE clause there's no longer a spool in the plan:

third query

The UDF reference is a startup expression filter but it's still executed 9 times, once for each execution of the filter. If I change the query to use the deterministic no_sleep function then the UDF is only executed once, despite the query plan not changing at all.

To summarize, functions can be executed once per query or once per row depending on the function definition as well as the query definition. Sometimes it can be difficult to figure out how many times the UDF will execute just by looking at the plan. If you need to guarantee that a function will only be executed once you should save the value to a local variable and use that local variable in your query, along with a RECOMPILE hint if you need the parameter embedding optimization.