Sql-server – sp_send_dbmail is populating 00 bits between all characters in a result set

database-mailencodingsql servert-sqlunicode

I'm using sp_send_dbmail in a SQL agent job triggered nightly. The job queries our database and checks for product price updates, if there are some, it will send them in an email as an attachment to our e-commerce vendor. They have some automatic processes they will update our e-commerce platform, however their automated process cannot deal with the file provided by sp_send_dbmail.
sp_send_dbmail seems to be putting null characters between all characters in the result set.

See result of opening the csv in a hex editor:

HexCapture

Viewed through a text editor I see as expected:

TextCapture

sp_send_dbmail query here:

    SET @FileName = 'Update_' + CONVERT(VARCHAR(12),GETDATE),105) '.csv'        
    SET @Query = 'Some Query'

    EXEC msdb.dbo.sp_send_dbmail
        @recipients = 'me@domain.com'
        @query = @Query
        @attach_query_result_as_file = 1
        @query_attachment_filename = @FileName
        @query_result_separator = ','
        @query_result_no_padding =  1
    END

What's going on here?

****EDIT FOR ADDITIONAL CLARIFICATION****

Using Microsoft SQL Server 2008 R2
Collation is Latin1_General_CI_AS

Leaving no_padding out leaves several trailing spaces in each returned field. each space (20) separated by a null (00)

NoPadCapture

Best Answer

The file encoding for the attachments is standard UTF-16 Little Endian (LE) -- one of the various Unicode encodings. The first two bytes, shown in the first image, are FF FE. These are the Byte Order Mark (BOM), which is an explicit indication of the files encoding. And you can see in the chart on that linked Wikipedia page that FF FE indicates an encoding of UTF-16 Little Endian.

UTF-16 uses 2 or 4 bytes per "character", so the 00 null bytes are really just half of each of those characters. And, because it is Little Endian instead of Big Endian, the bytes in each pair are swapped. Meaning, the highlighted 00 in the top image is paired with the preceding 50 (for the capital P). But the pair 50 00 is just Little Endian for 00 50, which is Unicode Code Point U+0050 (there is a chart at the bottom of that page that shows the encoding variations).

If you were to open that file in any editor that auto-detected the file encoding via the BOM, it would know that it was a UTF-16 LE encoded file and display it correctly, as you would expect to see it. Just try it out in something like Notepad++.

The first thing I would do is contact the vendor and/or check the documentation to see if either the e-commerce platform already has a means of dealing with UTF-16 LE encoded files (a.k.a. "Unicode" in Microsoft-land), OR if they can make a small change to handle Unicode files. I would be surprised if they both don't already handle this in some fashion and won't do it if not already able to. Given that this world is moving towards more internationalization rather than less, you can't be the only client needing to pass in a Unicode file. Also, converting the file to ASCII Extended could lead to data loss if your data naturally has characters in it that do not fit into a single code page.

The last thing I would do is update the definition of sp_send_dbmail. I am not saying to never, ever do that, but in most cases it is unnecessary. If anything, it would be preferable to convert the encoding of the attachment after it is sent (which will probably need to be done by the recipient of the email). An intentional conversion will allow for specifying the correct encoding, whether it be UTF-8 or Extended ASCII using one of the many available code pages.