Is an implicit conversion in a UNION ALL
when one of the data type is VARCHAR(MAX)
something to worry about?
e.g.
DECLARE @value1 VARCHAR(1)= 'a';
DECLARE @value2 VARCHAR(MAX)= 'a';
SELECT @value1
UNION ALL
SELECT @value2;
If you look at the execution plan there will be a CONVERT_IMPLICIT warning:
(Scalar Operator(CONVERT_IMPLICIT(varchar(max),[@value1],0))), (Scalar
Operator([@value2]))
Is this going to behave as a normal conversion i.e. convert all the VARCHAR(1)
columns to VARCHAR(MAX)
and have an impact on performance? Is there a way to avoid it other than changing the table structure
Best Answer
Yes, this can impact performance. Not because of the conversion itself, but because of memory requirements. Let's fire up a quick example:
Now, let's run a simple
UNION ALL
query, that does something that requires memory (like a sort), between these tables and compare1 UNION 2
vs.1 UNION 3
vs.3 UNION 1
(the last one to show that order doesn't matter):Now if we look at the plans, they look similar enough, and in an isolated test on an otherwise idle machine, they even perform quite similarly:
And you will see similar things in Management Studio - it will tell you that all three plans have an equal share of the cost of the batch (33.3%) because they're doing the same amount of work and returning the exact same data. Which is all true.
However, there's a problem here that is not immediately obvious. First, you'll find a warning on the root node operator (
SELECT
). It tells you about the implicit conversion that you've already noted, but it also mentions an excessive memory grant.Next, if you change "Line Widths By" from "Rows" to "Data Size (MB)" (something you can only do in Plan Explorer), you will see quite different diagrams. The first one doesn't change in any real discerning way, but the second one shows some alarmingly large data flows.
Where does all this extra data come from? Estimated row size. SQL Server assumes that the amount of data that's in any variable-width column is going to be populated, on the average row, 50%. The cap for varchar/nvarchar is 8K, even for
max
, so on the plans where all of the rows from both tables have to be upsized tovarchar(max)
, SQL Server assumes that the string column will have 4K of data on every row, which you can see if you hover over the large connector lines:And the tooltip for the
SELECT
operator shows the grant details, and the ginormous delta between the amount of memory SQL Server thought it would need and the amount it actually needed:You'll see a similar symptom even if you only pull a single row from the table with the
max
column:So, yes, you'll potentially see performance problems that have nothing to do with the conversion itself. In this specific example the end result isn't affected, and the memory hogging has no ill effect, but imagine if you had 20 or 100 or 5,000 concurrent users running this query.