The fact that you are comparing it against an integer
variable is irrelevant.
The plan for COUNT
always has an CONVERT_IMPLICIT(int,[ExprNNNN],0))
where ExprNNNN
is the label for the expression representing the result of the COUNT
.
My assumption has always been that the code for COUNT
just ends up calling the same code as COUNT_BIG
and the cast is necessary to convert the bigint
result of that back down to int
.
In fact COUNT_BIG(*)
isn't even distinguished in the query plan from COUNT(*)
. Both show up as Scalar Operator(Count(*))
.
COUNT_BIG(nullable_column)
does get distinguished in the execution plan from COUNT(nullable_column)
but the latter still gets an implicit cast back down to int
.
Some evidence that this is the case is below.
WITH
E1(N) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows
, E16(N) AS (SELECT 1 FROM E8 a, E8 b) -- 1*10^16 or 10,000,000,000,000,000 rows
, T(N) AS (SELECT TOP (2150000000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E16)
SELECT COUNT(CASE WHEN N < 2150000000 THEN 1 END)
FROM T
OPTION (MAXDOP 1)
This takes about 7 minutes to run on my desktop and returns the following
Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.
Warning: Null value is eliminated by an aggregate or other SET operation.
Which indicates that the COUNT
must have continued on after an int
would have overflowed (at 2147483647) and the last row (2150000000) was processed by the COUNT
operator leading to the message about NULL
being returned.
By way of comparison replacing the COUNT
expression with SUM(CASE WHEN N < 2150000000 THEN 1 END)
returns
Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.
with no ANSI
warning about NULL
. From which I conclude the overflow happened in this case during the aggregation itself before row 2,150,000,000 was reached.
However, execution plans still show an implicit conversion warning even though the conversion happens when the table is updated, not when it is read from.
Persisting a computed column does not guarantee that the persisted value will be used. The optimizer makes a cost-based decision between using the persisted value and computing the expression afresh, though there are also other factors in play. Simplifying, the process looks like this:
- The computed column reference is always expanded to its definition before query compilation and optimization begins. This is very similar to the way view references are handled (unless indexed and referenced with a
NOEXPAND
hint). The expansion provides the greatest opportunity for orthogonal simplifications and optimizations to be applied.
- Depending on the query and the code path taken through the query optimizer, a cardinality estimate may be requested for an expression in the expanded computed column definition. Deriving this cardinality estimation may result in a plan-affecting convert warning being added to an internal warning list.
- The expanded computed column expression may be matched back to a persisted column or index later in the compilation and optimization process. Any plan-affecting convert warning added previously is not tracked and removed when this substitution occurs.
Like most warnings, the plan-affecting convert warning is opportunistic and informational. It is opportunistic in that it is only added if the optimizer follows a code path that attempts to compute cardinality on a suitable expression. This is similar to 'missing index suggestions', which are only added if the optimizer attempts to match to an 'ideal' index definition that is not found. To put it another way: neither of these facilities are based on exhaustive analysis.
I tried using a UDF in the computed column definition to get rid of the warning
This 'works' because the optimizer cannot expand a scalar UDF into its definition before optimization. The UDF is a 'black box' with guessed cardinality and horrible runtime performance (close to the cost of running a complete separate query per function invocation). Without expansion, a cardinality estimation that might generate the warning cannot occur.
Is there a reason to consider the warning anything other than a bug?
The warning indicates that the convert may affect plan quality because a cardinality estimation was performed on a problematic expression. The complexity of compilation and optimization is such that it is impossible to say if the cardinality estimate will affect the final quality of the plan or not, even if the computed column does end up being resolved to a persisted value or index.
So, the warning is generally useful because it indicates that internal optimization decisions may have been adversely impacted by the convert. I would always check a plan with this warning for inaccurate cardinality estimations, and any resulting performance or resource usage implications.
A final aside: the SQL Server version is not mentioned in the question, but for 2012 and later, TRY_CAST
or TRY_CONVERT
are more robust ways of handling this sort of requirement. This facility will not generally affect convert warnings one way or the other though.
Best Answer
There are two main issues which cause implicit (or explicit) conversion to make a big difference to the query plan:
The main point of problems is where a join or filter predicate, or an ordering or grouping, is over a converted column. Note also that this applies equally to any function on a column used in such a context. A column which is just selected has far less impact.
This means firstly, that the compiler cannot use an index lookup to get the data, nor can it rely on the conversion output to be in the correct order for operations such as a Merge join or Stream Aggregate.
This particular point is far more insidious than some realize. With some conversions, for example
int
tofloat
, the compiler understands these to be order-preserving. But conversions such asvarchar
tonvarchar
have no such semantic.Therefore often, when joining on say
varchar
andnvarchar
column, it is necessary to decide which table will be the one to be seeked, and therefore often an explicit conversion is better.The second issue is the more well-known one: no statistics. The compiler has no understanding of what the result of the conversion implies, therefore it cannot use statistics to determine estimated rowcounts and densities. This can often affect the whole plan, and often makes a bigger difference the deeper in the plan it is.
If the conversion or function is only in the select (and is not predicated in outer parts of the query), then there is far less impact, even though the same warning occurs. In this case, normally the conversion will happen just before the row is output from the query, it does not normally affect anytthe query plan.
Two cases where
select
can make a difference are:max
types. it is beneficial to ensure that the conversion happens as late as possible.CROSS APPLY
. Here, the opposite is true: do the calculation earlier, and prevent the compiler trying to nest the whole thing as one big Compute Scalar.OUTER APPLY
is your friend, it seems to give an optimization-fence effect.