Sql-server – ANSI_Padding setting

sql server

Recently I started looking at SQL Server ANSI_Padding setting. After reading BOL and some online articles, I find myself with more questions and confusion than I started with.

I understand that I would be better off sticking to ANSI_Padding On, and I understand the different behaviors with NULL and NOT NULL columns when the setting is changed to ON or OFF.

But I cannot grasp this concept. What purpose does padding serve? Doesn't appending zeros to a binary number (or spaces to a character string) change the value?

Binary number 0x0000ee is equal to 0x00ee (at least when it is converted to decimal) But how is 0x00ee00 the same as 0x00ee?

A string 'a ' (1 space) is not the same as 'a' (no space), and yet they both become 'a ' (3 spaces) in a char(3) column. How is that even acceptable?

Same confusion also exists when values are trimmed.

BOL does seem to mention that ANSI_Padding is used to control how values are stored. So my initial guess was the actual values are not changed. But BOL does not elaborate on this point, nor do any of the articles I have found so far.

Thanks for your help.

Best Answer

First - here's a recent article that shows how things work pretty well.

There's also this article from Microsoft, explaining how SQL Server handles comparisons when strings have trailing spaces.

Why have padding? For NOT NULL columns where all the data will be very close to the same length, padding everything to (say) 10 characters may result in less space consumed than adding the overhead of making the column variable length. I think that making the column NULLable requires the same overhead as making it variable length, so including the padding becomes optional.

When you mention that BOL is talking about how the values are stored - yes, this does change the value. If you store 'a' (no space) and 'a ' (one space) in a char(3) NOT NULL column, they'll both be retrieved as 'a' (two spaces). IN a char(3) NULL column, they'll be retrieved as 'a' or 'a ', depending on the ANSI_PADDING setting when the table (or column) was created.

Why does the loss or retention of trailing spaces seem so unimportant?

Basically, when comparing two strings, SQL pads the shorter string out with spaces until they're the same length. So:

SELECT CASE WHEN 'a' = 'a ' THEN 'TRUE' ELSE 'FALSE' END as String_Comp;

returns TRUE.

Also - binary() and varbinary() should maybe not be thought of as being simply binary representations of numbers (which, of course, they are), but as binary strings. In comparisons of binary values as binaries, SQL does not pay attention to trailing zeroes.

An example: 0X00EE00 is the decimal number 60928. 0X00EE is the decimal number 238.

SELECT CASE WHEN CAST(0x00EE00 as int) = CAST(0x00EE as int)
         THEN 'TRUE'
         ELSE 'FALSE'
       END as Binary_Number_Comp;

returns FALSE (60928 <> 238). However:

SELECT CASE WHEN CAST(0x00EE00 as binary(10)) = CAST(0x00EE as varbinary(6))
         THEN 'TRUE'
         ELSE 'FALSE'
       END as Binary_String_Comp;

returns TRUE; as noted by Max Vernon, the trailing zeroes are acting like a NULL terminator in a string, and are not included in the comparison.

Interestingly:

SELECT CASE WHEN 0x00EE00 = 0x00EE THEN 'TRUE' ELSE 'FALSE' END as Binary_Value_Comp;

returns TRUE, so by default this is comparing the values as binary strings, not binary numbers.

Per the docs above, a LIKE comparison is the only way to differentiate between string values based on trailing zeroes.

SELECT CASE WHEN 'a' LIKE 'a ' THEN 'TRUE' ELSE 'FALSE' END as LIKE_String_Comp;

returns FALSE.

SO, in summary, unless you really work at it, SQL Server ignores trailing spaces regardless of the ANSI_PADDING settings, which is why trimming them or not is treated so cavalierly.

And, by the way, here's a quick example of the impact of ANSI_PADDING

Query:

SET ANSI_PADDING ON;

IF (OBJECT_ID('tempdb..#test') IS NOT NULL) DROP TABLE #test;

CREATE TABLE #test
     ( my_char CHAR(10) NOT NULL
      ,my_varchar VARCHAR(10) NOT NULL
      ,my_char_NULL CHAR(10) NULL
      ,my_binary BINARY(6) NOT NULL
      ,my_varbinary VARBINARY(6) NOT NULL
      ,my_binary_NULL BINARY(6) NULL
     );

INSERT INTO #test
VALUES ('ABC   ','ABC   ','ABC   ',0xABC000,0xABC000,0xABC000);

PRINT 'ANSI_PADDING ON:';

SELECT '|' + my_char + '|' as my_char
      ,'|' + my_varchar + '|' as my_varchar
      ,'|' + my_char_NULL + '|' as my_char_NULL
      ,my_binary
      ,my_varbinary
      ,my_binary_NULL
  FROM #test;

SET ANSI_PADDING OFF;

IF (OBJECT_ID('tempdb..#test2') IS NOT NULL) DROP TABLE #test2;

CREATE TABLE #test2
     ( my_char CHAR(10) NOT NULL
      ,my_varchar VARCHAR(10) NOT NULL
      ,my_char_NULL CHAR(10) NULL
      ,my_binary BINARY(6) NOT NULL
      ,my_varbinary VARBINARY(6) NOT NULL
      ,my_binary_NULL BINARY(6) NULL
     );

INSERT INTO #test2
VALUES ('ABC   ','ABC   ','ABC   ',0xABC000,0xABC000,0xABC000);

PRINT 'ANSI_PADDING OFF:';

SELECT '|' + my_char + '|' as my_char
      ,'|' + my_varchar + '|' as my_varchar
      ,'|' + my_char_NULL + '|' as my_char_NULL
      ,my_binary
      ,my_varbinary
      ,my_binary_NULL
  FROM #test2;

Results:

ANSI_PADDING ON:
my_char      my_varchar   my_char_NULL my_binary      my_varbinary   my_binary_NULL
------------ ------------ ------------ -------------- -------------- --------------
|ABC       | |ABC   |     |ABC       | 0xABC000000000 0xABC000       0xABC000000000


ANSI_PADDING OFF:
my_char      my_varchar   my_char_NULL my_binary      my_varbinary   my_binary_NULL
------------ ------------ ------------ -------------- -------------- --------------
|ABC       | |ABC|        |ABC|        0xABC000000000 0xABC0         0xABC0



NOTE: As Max Vernon pointed out in a comment on his answer, my notes above are all about how SQL Server handles things. Equality of values in whatever application may be retrieving data from SQL Server may very well behave entirely different.