Sql-server – Does WITH SCHEMABINDING on a multi-statement TVF improve cardinality estimates

cardinality-estimatesfunctionsperformancesql serversql-server-2005

Based on http://blogs.msdn.com/b/psssql/archive/2010/10/28/query-performance-and-multi-statement-table-valued-functions.aspx and other articles, SQL Server assumes that a multi-line table valued function returns one row. This causes the selection of a poor execution plan for the calling statement if it actually returns many rows.

Does adding WITH SCHEMABINDING to the RETURNS clause of the CREATE FUNCTION result in a more correct cardinality estimate for the return value of the function?

If we assume that we are passing a UserId to this function and getting back a table of RecordId values that the user is allowed to access, and that some users are only allowed to see a few records and that some are allowed to see many or even all records, would either the function or the calling statements (or the procedures that include them) benefit from using FORCE RECOMPILE? Does the use of WITH SCHEMABINDING in the function change this answer?

I realize that I could figure this out by experimentation, but I am hoping that someone has already figured out the answer. A pointer to someplace where this is well documented would be helpful.

Best Answer

In my tests, no, adding WITH SCHEMABINDING does not improve cardinality estimates. I created a simple table:

CREATE TABLE dbo.myobjects(id INT PRIMARY KEY);

INSERT dbo.myobjects SELECT [object_id] FROM sys.all_objects;

Then two functions:

CREATE FUNCTION dbo.noschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
AS
BEGIN
  INSERT @x SELECT id FROM dbo.myobjects;

  RETURN;
END
GO

CREATE FUNCTION dbo.withschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
WITH SCHEMABINDING
AS
BEGIN
  INSERT @x SELECT id FROM dbo.myobjects;

  RETURN;
END
GO

Comparing the actual plans, both show estimated rows = 1, actual rows = 2112 (this latter number may differ on your system depending on version/SP etc).

Comparing the speed:

SET NOCOUNT ON;
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.noschemabinding(1);
DROP TABLE #x;
GO 1000
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.withschemabinding(1);
DROP TABLE #x;
GO 1000
SELECT SYSDATETIME();

Results:

                    run 1               run 2
----------------    ------------------  ------------------
No schemabinding    14632 milliseconds  14079 milliseconds
Schemabinding       14251 milliseconds  13979 milliseconds

So, does it matter much? Nope.

SCHEMABINDING in this case is used for a more important goal: underlying schema stability. You will probably have much better optimization opportunities if you pursue converting your function to an inline TVF than to chase down obscure plan-affecting differences in a multi-statement TVF.

Related Solutions

Sql-server – Why does this query become drastically slower when wrapped in a TVF

I isolated the problem to one line in the query. Keeping in mind that the query is 160 lines long, and I'm including the relevant tables either way, if I disable this line from the SELECT clause:

COALESCE(V.Visits, 0) * COALESCE(ACS.AvgClickCost, GAAC.AvgAdCost, 0.00)

...the run time drops from 63 minutes to five seconds (inlining a CTE has made it slightly faster than the original seven-second query). Including either ACS.AvgClickCost or GAAC.AvgAdCost causes the run time to explode. What makes it especially odd is that these fields come from two subqueries which have, respectively, ten rows and three! They each run in zero seconds when run independently, and with the row counts being so short I would expect the join time to be trivial even using nested loops.

Any guesses as to why this seemingly-harmless calculation would throw off a TVF completely, while it runs very quickly as a stand-alone query?

Sql-server – How to consider when deciding between passing a comma-delimited string to a stored procedure instead of calling it individually per record

Personally I would choose to pass a list of id's in as a table parameter to the stored procedure this would then allow you to do a set-based update instead of a row by row one which is less efficient.

I have never personally used the EF but a good artical on performing the above using ADO is below (ignore the fact it says it is for SQL 2008 as it also works on 2005). The same strategy would work better for you in this situation but you may need to adapt the implementation based on the fact you are using the Entity Framework.

http://www.mssqltips.com/sqlservertip/2112/table-value-parameters-in-sql-server-2008-and-net-c/

EDIT

As you rightly pointed out I am wrong about the fact this works on 2005 - sorry about that!

However, I have some alternate suggestions.

As SQL Server 2005 does support table variables (just not as parameters to stored procedures as you pointed out) you could parse the delimited string and insert the id's into a table variable. You could then use the table variable to perform a set-based update.

Alternatively the link below provides a different take on the same problem by persisting the values to a table first thereby avoiding serialization and de-serialization of the id values:

http://weblogs.sqlteam.com/jeffs/archive/2007/06/26/passing-an-array-or-table-parameter-to-a-stored-procedure.aspx

I hope this helps you.

Best Answer

Related Solutions

Sql-server – Why does this query become drastically slower when wrapped in a TVF

Sql-server – How to consider when deciding between passing a comma-delimited string to a stored procedure instead of calling it individually per record

Related Question