Sql-server – Implicit conversion: when does it hurt performance a lot

datatypessql servert-sqltype conversion

I know that when data types are the same it is good and it hurts performance when they are not.

But sometimes it seems to have very little impact and basically a waste of time to fix it. Other times it will have a huge impact and performance gets a lot better.

Of course there are many scenarios, e.g. a insert select where the data type of the select is not the same, but compatible, with the insert column. Or it could be on the join predicates.

Where does implicit conversion hurt the most?

Also it seems that one also gets an implicit conversion when the lengths don't match, e.g. joining varchar(10) with varchar(20). How big a deal is these scenarios when we are only dealing with the length of the columns and not their type?

Basically I would like to know when I should worry about the compute scalar operator in the execution plan. How can I know if removing it will have a significant impact?

Best Answer

There are two main issues which cause implicit (or explicit) conversion to make a big difference to the query plan:

The main point of problems is where a join or filter predicate, or an ordering or grouping, is over a converted column. Note also that this applies equally to any function on a column used in such a context. A column which is just selected has far less impact.

This means firstly, that the compiler cannot use an index lookup to get the data, nor can it rely on the conversion output to be in the correct order for operations such as a Merge join or Stream Aggregate.

This particular point is far more insidious than some realize. With some conversions, for example int to float, the compiler understands these to be order-preserving. But conversions such as varchar to nvarchar have no such semantic.

Therefore often, when joining on say varchar and nvarchar column, it is necessary to decide which table will be the one to be seeked, and therefore often an explicit conversion is better.

The second issue is the more well-known one: no statistics. The compiler has no understanding of what the result of the conversion implies, therefore it cannot use statistics to determine estimated rowcounts and densities. This can often affect the whole plan, and often makes a bigger difference the deeper in the plan it is.

If the conversion or function is only in the select (and is not predicated in outer parts of the query), then there is far less impact, even though the same warning occurs. In this case, normally the conversion will happen just before the row is output from the query, it does not normally affect anytthe query plan.

Two cases where select can make a difference are:

Conversions to much larger types, such as the LOB max types. it is beneficial to ensure that the conversion happens as late as possible.
Very complex nested string functions, that are passed to each other via CROSS APPLY. Here, the opposite is true: do the calculation earlier, and prevent the compiler trying to nest the whole thing as one big Compute Scalar. OUTER APPLY is your friend, it seems to give an optimization-fence effect.

Related Solutions

Sql-server – Why does sql server need to convert count(*) result into int before comparing it with an int variable

The fact that you are comparing it against an integer variable is irrelevant.

The plan for COUNT always has an CONVERT_IMPLICIT(int,[ExprNNNN],0)) where ExprNNNN is the label for the expression representing the result of the COUNT.

My assumption has always been that the code for COUNT just ends up calling the same code as COUNT_BIG and the cast is necessary to convert the bigint result of that back down to int.

In fact COUNT_BIG(*) isn't even distinguished in the query plan from COUNT(*). Both show up as Scalar Operator(Count(*)).

COUNT_BIG(nullable_column) does get distinguished in the execution plan from COUNT(nullable_column) but the latter still gets an implicit cast back down to int.

Some evidence that this is the case is below.

WITH 
E1(N) AS 
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
, E16(N) AS (SELECT 1 FROM E8 a, E8 b)  -- 1*10^16 or 10,000,000,000,000,000 rows
, T(N) AS (SELECT TOP (2150000000) 
                  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E16)
SELECT COUNT(CASE WHEN N < 2150000000 THEN 1 END)
FROM T 
OPTION (MAXDOP 1)

This takes about 7 minutes to run on my desktop and returns the following

Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.
Warning: Null value is eliminated by an aggregate or other SET operation.

Which indicates that the COUNT must have continued on after an int would have overflowed (at 2147483647) and the last row (2150000000) was processed by the COUNT operator leading to the message about NULL being returned.

By way of comparison replacing the COUNT expression with SUM(CASE WHEN N < 2150000000 THEN 1 END) returns

Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.

with no ANSI warning about NULL. From which I conclude the overflow happened in this case during the aggregation itself before row 2,150,000,000 was reached.

Sql-server – Implicit conversion warning despite persisted computed column vs. performance hit from UDF in persisted computed column definition

However, execution plans still show an implicit conversion warning even though the conversion happens when the table is updated, not when it is read from.

Persisting a computed column does not guarantee that the persisted value will be used. The optimizer makes a cost-based decision between using the persisted value and computing the expression afresh, though there are also other factors in play. Simplifying, the process looks like this:

The computed column reference is always expanded to its definition before query compilation and optimization begins. This is very similar to the way view references are handled (unless indexed and referenced with a NOEXPAND hint). The expansion provides the greatest opportunity for orthogonal simplifications and optimizations to be applied.
Depending on the query and the code path taken through the query optimizer, a cardinality estimate may be requested for an expression in the expanded computed column definition. Deriving this cardinality estimation may result in a plan-affecting convert warning being added to an internal warning list.
The expanded computed column expression may be matched back to a persisted column or index later in the compilation and optimization process. Any plan-affecting convert warning added previously is not tracked and removed when this substitution occurs.

Like most warnings, the plan-affecting convert warning is opportunistic and informational. It is opportunistic in that it is only added if the optimizer follows a code path that attempts to compute cardinality on a suitable expression. This is similar to 'missing index suggestions', which are only added if the optimizer attempts to match to an 'ideal' index definition that is not found. To put it another way: neither of these facilities are based on exhaustive analysis.

I tried using a UDF in the computed column definition to get rid of the warning

This 'works' because the optimizer cannot expand a scalar UDF into its definition before optimization. The UDF is a 'black box' with guessed cardinality and horrible runtime performance (close to the cost of running a complete separate query per function invocation). Without expansion, a cardinality estimation that might generate the warning cannot occur.

Is there a reason to consider the warning anything other than a bug?

The warning indicates that the convert may affect plan quality because a cardinality estimation was performed on a problematic expression. The complexity of compilation and optimization is such that it is impossible to say if the cardinality estimate will affect the final quality of the plan or not, even if the computed column does end up being resolved to a persisted value or index.

So, the warning is generally useful because it indicates that internal optimization decisions may have been adversely impacted by the convert. I would always check a plan with this warning for inaccurate cardinality estimations, and any resulting performance or resource usage implications.

A final aside: the SQL Server version is not mentioned in the question, but for 2012 and later, TRY_CAST or TRY_CONVERT are more robust ways of handling this sort of requirement. This facility will not generally affect convert warnings one way or the other though.

Best Answer

There are two main issues which cause implicit (or explicit) conversion to make a big difference to the query plan:

Related Solutions

Sql-server – Why does sql server need to convert count(*) result into int before comparing it with an int variable

Sql-server – Implicit conversion warning despite persisted computed column vs. performance hit from UDF in persisted computed column definition

Related Question