Sql-server – Why does sql server need to convert count(*) result into int before comparing it with an int variable

sql serversql-server-2008

I have many queries in my application where in the having clause, I have comparison of count aggregate function with int variable.
In the query plans, I can see an implicit_convert before the comparison.
I want to know why this happens because as per sql server documentation, the return type of count function is int. So why should there be an implicit conversion for comparison of two int values?

Following is a part of one such query plan where @IdCount is defined as an int variable.

|--Filter(WHERE:([Expr1022]=[@IdCount]))    
 |--Compute Scalar(DEFINE:([Expr1022]=CONVERT_IMPLICIT(int,[Expr1028],0))) 
  |--Stream Aggregate(GROUP BY:([MOCK_DB].[dbo].[Scope].[ScopeID]) DEFINE:([Expr1028]=Count(*)))

Best Answer

The fact that you are comparing it against an integer variable is irrelevant.

The plan for COUNT always has an CONVERT_IMPLICIT(int,[ExprNNNN],0)) where ExprNNNN is the label for the expression representing the result of the COUNT.

My assumption has always been that the code for COUNT just ends up calling the same code as COUNT_BIG and the cast is necessary to convert the bigint result of that back down to int.

In fact COUNT_BIG(*) isn't even distinguished in the query plan from COUNT(*). Both show up as Scalar Operator(Count(*)).

COUNT_BIG(nullable_column) does get distinguished in the execution plan from COUNT(nullable_column) but the latter still gets an implicit cast back down to int.

Some evidence that this is the case is below.

WITH 
E1(N) AS 
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
, E16(N) AS (SELECT 1 FROM E8 a, E8 b)  -- 1*10^16 or 10,000,000,000,000,000 rows
, T(N) AS (SELECT TOP (2150000000) 
                  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E16)
SELECT COUNT(CASE WHEN N < 2150000000 THEN 1 END)
FROM T 
OPTION (MAXDOP 1)

This takes about 7 minutes to run on my desktop and returns the following

Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.
Warning: Null value is eliminated by an aggregate or other SET operation.

Which indicates that the COUNT must have continued on after an int would have overflowed (at 2147483647) and the last row (2150000000) was processed by the COUNT operator leading to the message about NULL being returned.

By way of comparison replacing the COUNT expression with SUM(CASE WHEN N < 2150000000 THEN 1 END) returns

Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.

with no ANSI warning about NULL. From which I conclude the overflow happened in this case during the aggregation itself before row 2,150,000,000 was reached.

Related Solutions

Sql-server – Why the query scan all the partitions

Your column is of type CHAR(6). Your filter is of type VARCHAR (perhaps counter intuitive, but that what the constant literal '200706' is). The rules of Data Type Precedence dictate that the comparison occurs using the type with higher precedence, ie. the VARCHAR type. This is a different type than the partitioning function, therefore you do not get partition elimination.

Try this instead:

select count(*)
from T (nolock)
where Month = CAST('200706' as CHAR(6));

Your plan should get a partitioning elimination filter.

Sql-server – Why would call to scalar function inside a Table Value Function be slower than outside the TVF

Scalar functions are called once-per-row, when called as part of a query.

Consider the following example.

Create a new, blank database for our tests:

USE master;
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'mv')
BEGIN
    ALTER DATABASE mv SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE mv;
END
GO
CREATE DATABASE mv;
GO

Create a table, a multi-statement function, and a table-valued-function:

USE mv;
GO
CREATE TABLE dbo.t
(
    t_id int NOT NULL
        CONSTRAINT PK_t
        PRIMARY KEY CLUSTERED
);
GO

CREATE FUNCTION dbo.t_func
(
    @t_id int
)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @r bit;
    IF EXISTS (SELECT 1 FROM dbo.t WHERE t.t_id = @t_id)
        SET @r = 1
    ELSE
        SET @r = 0;
    RETURN @r;
END
GO

CREATE FUNCTION dbo.t_tvf
(
    @min_t_id int
    , @max_t_id int
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
RETURN (
    SELECT t_id = t.t_id
        , e = dbo.t_func(dbo.t.t_id)
    FROM dbo.t
    WHERE t.t_id >= @min_t_id
        AND t.t_id <= @max_t_id
);
GO

Insert some sample data into the table:

INSERT INTO dbo.t (t_id)
SELECT ROW_NUMBER() OVER (ORDER BY c.id, c.colid)
FROM sys.syscolumns c;
GO

Create a table to store function execution stats, and populate it with a start-row showing execution counts for the multi-statement-function, t_func:

CREATE TABLE dbo.function_stats
(
    run_num int NOT NULL
    , object_name sysname NOT NULL
    , execution_count int NULL 
    , CONSTRAINT PK_function_stats
        PRIMARY KEY CLUSTERED (run_num, object_name)
);
GO
INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 1
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';
GO

Run a query against the TVF:
```
SELECT t.*
FROM dbo.t_tvf(1, 2) t;
GO
```

Capture the execution stats now:

INSERT INTO dbo.function_stats (run_num, object_name, execution_count)
SELECT 2
    , o.name
    , COALESCE(fs.execution_count, 0)
FROM sys.objects o 
    LEFT JOIN sys.dm_exec_function_stats fs ON fs.object_id = o.object_id
WHERE o.name = 't_func';

The function stats results:

SELECT *
FROM dbo.function_stats fs
ORDER BY fs.run_num
    , fs.object_name;

╔═════════╦═════════════╦═════════════════╗
║ run_num ║ object_name ║ execution_count ║
╠═════════╬═════════════╬═════════════════╣
║       1 ║ t_func      ║               0 ║
║       2 ║ t_func      ║               2 ║
╚═════════╩═════════════╩═════════════════╝

As you can see, the multi-statement-function has execute twice, once per row for the source table accessed by the TVF.

I expect the mutli-statement-function is being called many, many times by the TVF, giving the impression that it is running slowly, whereas in fact it is simply being called many times.

Best Answer

Related Solutions

Sql-server – Why the query scan all the partitions

Sql-server – Why would call to scalar function inside a Table Value Function be slower than outside the TVF

Related Question