Sql-server – FORMAT returns large row size and data size

sql serversql-server-2012

I am surprise with one of my findings that using a FORMAT () does have very big impact on the row size and data size. It is almost 250x more of the size of not applying FORMAT ().

My question is:
1) Why does using FORMAT() have such big impact on the size? To me, is just 1.23 vs $1.23 difference, which is probably 1 character difference. And does it matter with such huge size?

2) Why do we still be encouraged to use FORMAT() in SQL server instead of using string concatenation as below. Since below is only 2X data size, and using format returns 250x data size. OR is that data size is not a critical measurement?

SELECT '$' + CONVERT(varchar(10), UnitPrice) FROM Sales.SalesOrderDetail;

3) Does having data size of 464MB means i will be returning 464MB of data to client?

========================================================

Below is my findings with AdventureWorks2012 database.

SELECT UnitPrice FROM Sales.SalesOrderDetail;

Actual Number of Rows: 121317
Estimated Number of Rows: 121317
Estimated Row Size: 15B
Estimated Data Size: 1777KB

SELECT '$' + CONVERT(varchar (10), UnitPrice) FROM Sales.SalesOrderDetail;

Actual Number of Rows: 121317
Estimated Row Size: 26B
Estimated Data Size: 3060KB

SELECT FORMAT(UnitPrice, 'c') FROM Sales.SalesOrderDetail;

Actual Number of Rows: 121317
Estimated Row Size: 4011B
Estimated Data Size: 464MB

Best Answer

FORMAT() has an (admittedly undocumented) output of nvarchar(4000), at least in the cases of converting ints and dates to strings. The documentation simply says...

The length of the return value is determined by the format.

But then doesn't explain or provide any examples. You can see what I'm describing, though, with:

SELECT TOP (1) object_id, x = FORMAT(object_id, 'en-us') 
  INTO #blat FROM sys.all_objects;

EXEC tempdb.sys.sp_help N'#blat';

Result is that x is an nvarchar with a length of 8,000 (this is the number of bytes, not the number of characters).

Estimated row size is based on an assumption that variable width values will be half-populated. So, it expects 2,000 characters (4,000 bytes) on each row (even if the particular parameters you supply can't possibly result in that many characters). I demonstrate this (but not with FORMAT() specifically) in another answer, Would using varchar(5000) be bad compared to varchar(255)?

This is one reason I prefer to use CONVERT() and TRY_CONVERT() equivalents instead of FORMAT(), in spite of its syntactic sugar. At least with those you can convert to a defined width instead of relying on it "being determined by the format." Which may or may help estimated size, depending on the query. Another example that demonstrates the benefit here (even though it requires uglier code):

DECLARE @m float = 32.74532323;

SELECT 
    a = @m, 
    b = FORMAT(@m, 'c'), 
    c = '$' + CONVERT(varchar(12), CONVERT(decimal(8,2),@m))
 INTO #splunge 
 FROM sys.all_objects;

EXEC tempdb.sys.sp_help N'#splunge';

Results:

a    float
b    nvarchar(4000)
c    varchar(13)

Another reason I prefer to use CONVERT() and TRY_CONVERT() is that FORMAT() sucks from a performance perspective (see FORMAT() is nice and all, but…).

Also please don't ever use variable-width types like varchar without also specifying a length.