In my tests, no, adding WITH SCHEMABINDING
does not improve cardinality estimates. I created a simple table:
CREATE TABLE dbo.myobjects(id INT PRIMARY KEY);
INSERT dbo.myobjects SELECT [object_id] FROM sys.all_objects;
Then two functions:
CREATE FUNCTION dbo.noschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
AS
BEGIN
INSERT @x SELECT id FROM dbo.myobjects;
RETURN;
END
GO
CREATE FUNCTION dbo.withschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
WITH SCHEMABINDING
AS
BEGIN
INSERT @x SELECT id FROM dbo.myobjects;
RETURN;
END
GO
Comparing the actual plans, both show estimated rows = 1, actual rows = 2112 (this latter number may differ on your system depending on version/SP etc).
Comparing the speed:
SET NOCOUNT ON;
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.noschemabinding(1);
DROP TABLE #x;
GO 1000
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.withschemabinding(1);
DROP TABLE #x;
GO 1000
SELECT SYSDATETIME();
Results:
run 1 run 2
---------------- ------------------ ------------------
No schemabinding 14632 milliseconds 14079 milliseconds
Schemabinding 14251 milliseconds 13979 milliseconds
So, does it matter much? Nope.
SCHEMABINDING
in this case is used for a more important goal: underlying schema stability. You will probably have much better optimization opportunities if you pursue converting your function to an inline TVF than to chase down obscure plan-affecting differences in a multi-statement TVF.
If you must use a single query (as required by a single inline function), you can use one of the two options below (illustrated in my recent answer to Relating 2 tables with possible wildcards?):
Option 1
Use multiple APPLY
clauses with a startup condition for each using an outer reference from a previous apply in the chain. The efficiency of this method depends on the presence of startup filters in the execution plan. Correct results are guaranteed, but plan shape is not.
Option 2
Add an extra column with a constant literal to each clause of the union e.g. [Priority] = 1
then add an ORDER BY [Priority] ASC
at the TOP (1)
scope. Efficient operation depends on the plan avoiding sorts.
On reflection, this is not what you want in this case, because one row from each option is required by the merge concatenation in the plan. Nevertheless, it is an option in more general situations (where the alternate inputs produce more than one row, and the first row at low cost).
In addition:
Option 3
Since you're only returning a single row, you could use a multi-statement table-valued function instead, with explicit logic to try each option in order (in separate queries), returning as soon as the first result is found. This is guaranteed to produce correct results efficiently.
Note
The current function is tecnhically nondeterministic; SQL Server could evaluate the union all in any order it chooses, potentially returning a lower-priority result before evaluating a higher priority one.
Best Answer
The results of a multi-statement table-valued function (msTVF) are never cached or reused across statements (or connections), but there are a couple of ways that an msTVF result may be reused within the same statement. To that extent, an msTVF is not necessarily repopulated each time it is called.
Example msTVF
This (deliberately inefficient) msTVF returns a specified range of integers, with a timestamp on each row:
Static table variable
If all the parameters to the function call are constants (or runtime constants), the execution plan will populate the table variable result once. The remainder of the plan may access the table variable many times. The static nature of the table variable can be recognised from the execution plan. For example:
Returns a result similar to:
The execution plan is:
The Sequence operator first calls the Table Valued Function operator, which populates the table variable (note this operator returns no rows). Next, the Sequence calls its second input, which returns the contents of the table variable (using a Clustered Index Scan in this case).
The giveaway that the plan is using a 'static' table variable result is the Table Valued Function operator below a Sequence - the table variable needs to be fully populated once before the remainder of the plan can get going.
Multiple accesses
To show the table variable result being accessed more than once, we will use a second table with rows numbered from 1 to 5:
And a new query that joins this table to our function (this could equally be written as an
APPLY
):The result is:
The execution plan:
As before, the Sequence populates the table variable msTVF result first. Next, nested loops is used to join each row from table
T
to a row from the msTVF result. Since the function definition included a helpful index on the table variable, an index seek can be used.The key point is that when the parameters to the msTVF are constants (including variables & parameters) or treated as runtime constants for the statement by the execution engine, the plan will feature two separate operators for the msTVF table variable result: one to populate the table; another to access the results, possibly accessing the table multiple times, and possibly making use of indexes declared in the function definition.
Correlated parameters and non-constant parameters
To highlight the differences when correlated parameters (outer references) or non-constant function parameters are used, we will change the contents of table
T
so the function has much more work to do:The following modified query now uses an outer reference to table
T
in one of the function parameters:This query takes around 8 seconds to return results like:
Notice the time difference between rows in column
ts
. TheWHERE
clause limits the final result for a sensibly-sized output, but the inefficient function still takes a while to populate the table variable with 50,000-odd rows (depending on the correlated value ofi
from tableT
).The execution plan is:
Notice the lack of a Sequence operator. Now, there is a single Table Valued Function operator that populates the table variable and returns its rows on each iteration of the nested loops join.
To be clear: with just 5 rows in table T, the Table Valued Function operator runs 5 times. It generates 50,001 rows on the first iteration, 50,002 on the second...and so on. The table variable is 'thrown away' (truncated) between iterations, so each of the five calls is a full population. This is why it is so slow, and each row takes about the same time to appear in the result.
Side notes:
Naturally, the scenario above is deliberately contrived to show how poor performance can be when the msTVF populates many rows on each iteration.
A sensible implementation of the above code would set both msTVF parameters to
i
, and remove the redundantWHERE
clause. The table variable would still be truncated and repopulated on each iteration, but only with one row each time.We could also fetch the minimum and maximum
i
values fromT
and store them in variables in a prior step. Calling the function with variables instead of correlated parameters would allow the 'static' table variable pattern to be used as noted earlier.Caching for unchanged correlated parameters
Returning to address the original question once more, where the Sequence static pattern cannot be used, SQL Server can avoid truncating and repopulating the msTVF table variable if none of the correlated parameters have changed since the prior iteration of a nested loop join.
To demonstrate this, we will replace the contents of
T
with five identicali
values:The query with a correlated parameter again:
This time the results appear in around 1.5 seconds:
Note the identical timestamps on each row. The cached result in the table variable is reused for subsequent iterations where the correlated value
i
is unchanged. Reusing the result is much faster than inserting 50,005 rows each time.The execution plan looks very similar to before:
The key difference is in the Actual Rebinds and Actual Rewinds properties of the Table Valued Function operator:
When the correlated parameters do not change, SQL Server can replay (rewind) the current results in the table variable. When the correlation changes, SQL Server must truncate and repopulate the table variable (rebind). The one rebind happens on the first iteration; the four subsequent iterations are all rewinds since the value of
T.i
is unchanged.