SQL Server – Converting varbinary Data to varchar Leads to Unexpected Results

ccharacter-setsql serversql-server-2008

I need to fix some data conversion issue in SQL server 2008. I got some requirement change on storing the data. Without much care I converted existing varbinary data to varchar using CONVERT(NVARCHAR(max), @bytearraydata, 1).

The same conversion in C# is done using Encoding.Default.GetString and Encoding.Default.GetBytes methods. Encoding.Default.GetBytes(string) gets back the bytearray as it was earlier. But when I try to get back the byte array of string which I converted using CONVERT() gives me a wrong result.

My work is to fetch byte array stored as string the database and convert it to byte array and finally render the content as PDF. Data going through encoding mechanism (while saving and while fetching) works fine for me. But when I try to fetch the data which was converted using CONVERT it is failing to generate PDF.

How can I resolve this problem?

Summary:

Byte array column has been changed to string.

Existing data conversion done using this function:

Convert(NVARCHAR(MAX), @bytearraydata, 1)

In the application byte array conversion is done using Encoding.Default.GetString(bytearraydata)

Are Encoding and CONVERT not compatible?

Best Answer

When you use Encoding.Default the result is dependent on local settings:

An encoding for the operating system's current ANSI code page.

The specification is pretty direct about the dangers of using the default encoding and very specifically calls out the recommendation not to use it:

Different computers can use different encodings as the default, and the default encoding can even change on a single computer. Therefore, data streamed from one computer to another or even retrieved at different times on the same computer might be translated incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these two reasons, using the default encoding is generally not recommended

Now, for whatever reason, you expect the random current local encoding to match the server encoding. Even if the CONVERT function would do what you believe it does, the results would be random and unpredictable because:

  • the server local code page may differ from the client code page.
  • a code page change would render the persisted data unreadable because it was written with one encoding and then attempted to be read with another.

Furthermore, CONVERT does not do what you expect. CONVERT will cast a VARBINARY to an NVARCHAR using the UCS-2 encoding, since this is the encoding SQL Server uses for NVARCHAR data.

I suggest you approach your urgent problem by first reading this article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). Follow up with International Considerations for SQL Server.