Sql-server – Automatic Translation when Converting Unicode to non-Unicode / NVARCHAR to VARCHAR

collationencodingsql serverunicode

Unicode code point 9619 is a character called "Dark shade": (http://unicode-table.com/en/search/?q=9619).

Using the SQL_Latin1_General_CP1_CI_AS collation and 1252 code page, I would expect that casting / converting that Unicode character to non-Unicode data type would result in a question mark (?) as code page 1252 does not appear to contain this character and this appears to be SQL Server's behavior when conversion can not take place.

So my question is: why does SQL Server convert this character to an ASCII code 166 which is "Pipe, Broken vertical bar": ¦ ?

SELECT NCHAR(9619), CAST(NCHAR(9619) AS CHAR(1)), ASCII(CAST(NCHAR(9619) AS CHAR(1)))

Best Answer

Why does SQL convert Unicode 9619 to ASCII code 166?

SQL Server is not employing any special custom logic here; it is using standard operating system services to perform the conversion.

Specifically, the SQL Server type and expression service (sqlTsEs) calls into OS routine WideCharToMultiByte in kernel32.dll. SQL Server sets the input parameters to WideCharToMultiByte such that the routine performs a 'quick translation'. This is faster than requesting a specific default character be used when no direct translation exists.

The quick translation relies on the target code page to perform a best-fit mapping for any unmatched characters, as mentioned in the link Martin Smith provided in a comment to the question:

Best-fit strategies vary for different code pages, and they are not documented in detail.

When the input parameters are set for a quick translation, WideCharToMultiByte calls OS service GetMBNoDefault (source). Inspecting the SQL Server call stack when performing the conversion specified in the question confirms this:

SQL Server stack trace