Sql-server – Why would call to scalar function inside a Table Value Function be slower than outside the TVF

functionsperformancesql serversql-server-2008-r2

I am writing a Table Value Function, calling the function takes 10x as long as directly running the code. I traced this to a call to a multi-line scalar function inside the TVF. The call to the scalar function is excessively slow WHEN called from within the TVF. The scalar function takes 3 int parameters and returns a single int result.

What would cause it to be slower from within a TVF?

First, the TVF is basically a sort of pivot table, returning just one row, with 13 columns.

The scalar function is a multiline scalar, it looks for a matched set of keys (ie input is columnA, output is columnB, where input = columnA and active) in one of two tables. Searching table 1, then table 2 and finally table 1 again with a slightly changed where clause.

Usage was originally a cte query with a join clause:

  With cte as ( select count(*) over (partition by c.c1) recs,
                       c.setId, c.c1, c.c2, n.name
                From tbl c
                     Inner join tbl n on n.id = schma.scal(c.c1, c.c2, c.c3)
                Where c.setId=@setId and c.endDt is null
   )

This cte was originally called 11 times, returning from (about half the time) zero to maybe 3 rows (with a where clause like c1=42). So, I thought, hey, it's getting called a lot, since there are really only 4-15 rows in total, and the base tables are ~1,000,000, I'll cut that down to just 15 calls by putting the values into a table variable, and then doing an update. This wasn't really any faster but it did let me prove that it was the call to the scalar function that was slowing things down.

That looked like:

   Insert into @cte(recs, setId, c1, c2)
    select count(*) over (partition by c.c1) recs, c.setId, c.c1, c.c2
    From tbl c
    Where c.setId=@setId and c.endDt is null;

    Update c
     Set nId =schma.scal(c.c1, c.c2, c.c3)
    From @cte c

The @cte had, as I said, from 4 to 15 rows (11 in my most used setId). Putting a return first after, and then before this update is how I concluded that it was the scalar function causing the problem. Basically it was tens of milliseconds, more than a second, another few milliseconds and done.

The query plan from SSMS wasn't useful, and I was thinking about seeing if I could get more details using the plans in the DMVs.

Knowing that TVFs can be used to replace scalars, and that this can improve performance, I gave that a try, and it worked (times in ~100 ms range for the whole query when called inside the original TVF).

I'm still scratching my head as to why it would take longer to call from within the TVF than from outside.

From outside the TVF, I get comparable times using the scalar as I do using the new TFV inside the original TFV.

Best Answer

Scalar functions are called once-per-row, when called as part of a query.

Consider the following example.

Create a new, blank database for our tests:

USE master;
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'mv')
BEGIN
    ALTER DATABASE mv SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE mv;
END
GO
CREATE DATABASE mv;
GO

Create a table, a multi-statement function, and a table-valued-function:

USE mv;
GO
CREATE TABLE dbo.t
(
    t_id int NOT NULL
        CONSTRAINT PK_t
        PRIMARY KEY CLUSTERED
);
GO

CREATE FUNCTION dbo.t_func
(
    @t_id int
)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @r bit;
    IF EXISTS (SELECT 1 FROM dbo.t WHERE t.t_id = @t_id)
        SET @r = 1
    ELSE
        SET @r = 0;
    RETURN @r;
END
GO

CREATE FUNCTION dbo.t_tvf
(
    @min_t_id int
    , @max_t_id int
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
RETURN (
    SELECT t_id = t.t_id
        , e = dbo.t_func(dbo.t.t_id)
    FROM dbo.t
    WHERE t.t_id >= @min_t_id
        AND t.t_id <= @max_t_id
);
GO

Insert some sample data into the table:

INSERT INTO dbo.t (t_id)
SELECT ROW_NUMBER() OVER (ORDER BY c.id, c.colid)
FROM sys.syscolumns c;
GO

Create a table to store function execution stats, and populate it with a start-row showing execution counts for the multi-statement-function, t_func:

CREATE TABLE dbo.function_stats
(
    run_num int NOT NULL
    , object_name sysname NOT NULL
    , execution_count int NULL 
    , CONSTRAINT PK_function_stats
        PRIMARY KEY CLUSTERED (run_num, object_name)
);
GO
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 1
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';
GO

Run a query against the TVF:
```
SELECT t.*
FROM dbo.t_tvf(1, 2) t;
GO
```

Capture the execution stats now:

INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 2
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';

The function stats results:

SELECT *
FROM dbo.function_stats fs
ORDER BY fs.run_num
    , fs.object_name;

╔═════════╦═════════════╦═════════════════╗
║ run_num ║ object_name ║ execution_count ║
╠═════════╬═════════════╬═════════════════╣
║       1 ║ t_func      ║               0 ║
║       2 ║ t_func      ║               2 ║
╚═════════╩═════════════╩═════════════════╝

As you can see, the multi-statement-function has execute twice, once per row for the source table accessed by the TVF.

I expect the mutli-statement-function is being called many, many times by the TVF, giving the impression that it is running slowly, whereas in fact it is simply being called many times.

Related Solutions

Sql-server – Why does this query become drastically slower when wrapped in a TVF

I isolated the problem to one line in the query. Keeping in mind that the query is 160 lines long, and I'm including the relevant tables either way, if I disable this line from the SELECT clause:

COALESCE(V.Visits, 0) * COALESCE(ACS.AvgClickCost, GAAC.AvgAdCost, 0.00)

...the run time drops from 63 minutes to five seconds (inlining a CTE has made it slightly faster than the original seven-second query). Including either ACS.AvgClickCost or GAAC.AvgAdCost causes the run time to explode. What makes it especially odd is that these fields come from two subqueries which have, respectively, ten rows and three! They each run in zero seconds when run independently, and with the row counts being so short I would expect the join time to be trivial even using nested loops.

Any guesses as to why this seemingly-harmless calculation would throw off a TVF completely, while it runs very quickly as a stand-alone query?

Db2 – Why cannot I call a table function in iSeries DB2 that I just created

As Heinz Z. did, I discovered the problem.

One of your function parameters is char, while you pass the string literal 'ANY', which is considered a varchar. The database engine looks for an overloaded version of the function with varchar parameters, but doesn't find it.

Solution is either

changing function parameter to varchar

or cast parameter to char in the function call:

SELECT * 
FROM TABLE(TESTDAT.FNREPORT(DATE('10/23/2013'), 
                            DATE('10/23/2013'), 
                            CAST('ANY' AS CHAR(3))
          )) AS T

If doesn't work, try in any case to remove all parameters from the function and see if they are the culprits. Then you can investigate deeper adding one by one, try to work on dates format, for example you can try to pass current date instead of 10/23/2013.

Also you must investigate on why it thinks that FNREPORT is a type and not a function...

Best Answer

Related Solutions

Sql-server – Why does this query become drastically slower when wrapped in a TVF

Db2 – Why cannot I call a table function in iSeries DB2 that I just created

Related Question