I created a scalar valued function that pauses for as many seconds as passed in. This function won't be used in production. I am trying to test how SQL server evaluates and creates plans if I add a UDF to a query. I am very puzzled by the difference in how it evaluates the last of my 3 queries. First here is the code to create the test function:
CREATE FUNCTION dbo.[sleep](@seconds int)
RETURNS datetime
as
BEGIN
DECLARE @sleepUntil datetime
DECLARE @dummy int
SET @sleepUntil = DATEADD(s, @seconds, getdate())
WHILE getdate() < @sleepUntil
SET @dummy = 0
RETURN getdate()
END
Now to create a table to test queries
CREATE TABLE #table (num int)
and to insert values
INSERT INTO #table
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)
The first query takes 2 seconds to run – which makes sense assuming that SQL server stores the value returned from the scalar function.
SELECT num
FROM #table t
WHERE dbo.sleep(2)=0
The second one still takes 2 seconds – so my assumption is that it does the same.
SELECT num
FROM #table t
WHERE EXISTS (select TOP 1 1
FROM #table t1 )
OR dbo.sleep(2)=0
The third one is puzzling – it takes 20 seconds to run, why would sql server not store the value here instead it runs it for each row? An interesting point I noticed was that when I clicked to see live query statistics it took 20 seconds to start showing the plan.
SELECT num
FROM #table t
WHERE EXISTS (select TOP 1 1
FROM #table t1
WHERE t.num = t1.num) dbo.sleep(2)=0
Best Answer
To answer a question like this you need to carefully inspect the plans. I asked and answered a similar question here. It's probably worth reading through, but the most important parts are that compute scalars can be deferred and SQL Server 2016 gives us the sys.dm_exec_function_stats DMV so that it's easier to see how many times a UDF executes in a query.
Your example UDF is nondeterministic which can change how many times it's executed in a query. The use of
getdate()
and the lack ofSCHEMABINDING
makes it nondeterministic. You can verify this within SQL Server as well:Let's go through your queries. For the first query the UDF call is in a startup expression predicate:
The filter operator is executed once over 9 rows. The startup expression predicate means that the UDF is only executed once instead of one time for every row processed by the filter. That's why this query only takes 2 seconds.
If I change the query to the following:
Now the query takes 18 seconds. In the plan we can see that the filter is no longer a startup expression predicate:
Now the UDF is executed once per row. If I create a completely new function that's deterministic:
Now the query plan changes again:
For this query, the UDF is only executed a single time. You can verify that through the UDF, by using extended events, or by making the function execution take a long time (but it needs to remain deterministic).
For the second query in the question:
The UDF is only executed per query for a different reason. The results of the
WHERE
clause do not reference tablet
and SQL Server chooses to effectively cache them into a spool:If you look at the actual plan you can see that the spool was executed 9 times but the filter was only executed once over a single row.
Your third query doesn't compile, but I assume that you meant this:
Now that you reference a column from
t
in theWHERE
clause there's no longer a spool in the plan:The UDF reference is a startup expression filter but it's still executed 9 times, once for each execution of the filter. If I change the query to use the deterministic
no_sleep
function then the UDF is only executed once, despite the query plan not changing at all.To summarize, functions can be executed once per query or once per row depending on the function definition as well as the query definition. Sometimes it can be difficult to figure out how many times the UDF will execute just by looking at the plan. If you need to guarantee that a function will only be executed once you should save the value to a local variable and use that local variable in your query, along with a
RECOMPILE
hint if you need the parameter embedding optimization.