Sql-server – UNION ALL implicit conversion for VARCHAR(MAX) column

execution-plansql serversql-server-2016

Is an implicit conversion in a UNION ALL when one of the data type is VARCHAR(MAX) something to worry about?

e.g.

DECLARE @value1 VARCHAR(1)= 'a';
DECLARE @value2 VARCHAR(MAX)= 'a';

SELECT @value1
UNION ALL
SELECT @value2; 

If you look at the execution plan there will be a CONVERT_IMPLICIT warning:

(Scalar Operator(CONVERT_IMPLICIT(varchar(max),[@value1],0))), (Scalar
Operator([@value2]))

Is this going to behave as a normal conversion i.e. convert all the VARCHAR(1) columns to VARCHAR(MAX) and have an impact on performance? Is there a way to avoid it other than changing the table structure

Best Answer

Yes, this can impact performance. Not because of the conversion itself, but because of memory requirements. Let's fire up a quick example:

CREATE TABLE dbo.t1 (i int IDENTITY(1,1) PRIMARY KEY, c1 varchar(10));
CREATE TABLE dbo.t2 (i int IDENTITY(1,1) PRIMARY KEY, c1 varchar(10));
CREATE TABLE dbo.t3 (i int IDENTITY(1,1) PRIMARY KEY, c1 varchar(max));
GO

SET NOCOUNT ON;
GO

INSERT dbo.t1(c1) SELECT TOP (1000) LEFT(name,10) FROM sys.all_columns ORDER BY name;
INSERT dbo.t2(c1) SELECT TOP (1000) LEFT(name,10) FROM sys.all_columns ORDER BY name;
INSERT dbo.t3(c1) SELECT TOP (1000) LEFT(name,10) FROM sys.all_columns ORDER BY name;
GO 10

Now, let's run a simple UNION ALL query, that does something that requires memory (like a sort), between these tables and compare 1 UNION 2 vs. 1 UNION 3 vs. 3 UNION 1 (the last one to show that order doesn't matter):

SELECT           i, c1 FROM dbo.t1
UNION ALL SELECT i, c1 FROM dbo.t2
ORDER BY c1;

SELECT           i, c1 FROM dbo.t1
UNION ALL SELECT i, c1 FROM dbo.t3
ORDER BY c1;

SELECT           i, c1 FROM dbo.t3
UNION ALL SELECT i, c1 FROM dbo.t1
ORDER BY c1;

Now if we look at the plans, they look similar enough, and in an isolated test on an otherwise idle machine, they even perform quite similarly:

enter image description here

enter image description here

enter image description here

enter image description here

And you will see similar things in Management Studio - it will tell you that all three plans have an equal share of the cost of the batch (33.3%) because they're doing the same amount of work and returning the exact same data. Which is all true.

However, there's a problem here that is not immediately obvious. First, you'll find a warning on the root node operator (SELECT). It tells you about the implicit conversion that you've already noted, but it also mentions an excessive memory grant.

Next, if you change "Line Widths By" from "Rows" to "Data Size (MB)" (something you can only do in Plan Explorer), you will see quite different diagrams. The first one doesn't change in any real discerning way, but the second one shows some alarmingly large data flows.

enter image description here

enter image description here

enter image description here

Where does all this extra data come from? Estimated row size. SQL Server assumes that the amount of data that's in any variable-width column is going to be populated, on the average row, 50%. The cap for varchar/nvarchar is 8K, even for max, so on the plans where all of the rows from both tables have to be upsized to varchar(max), SQL Server assumes that the string column will have 4K of data on every row, which you can see if you hover over the large connector lines:

enter image description here

And the tooltip for the SELECT operator shows the grant details, and the ginormous delta between the amount of memory SQL Server thought it would need and the amount it actually needed:

enter image description here

You'll see a similar symptom even if you only pull a single row from the table with the max column:

SELECT                   i, c1 FROM dbo.t1
UNION ALL SELECT TOP (1) i, c1 FROM dbo.t3
ORDER BY c1;

So, yes, you'll potentially see performance problems that have nothing to do with the conversion itself. In this specific example the end result isn't affected, and the memory hogging has no ill effect, but imagine if you had 20 or 100 or 5,000 concurrent users running this query.