SQL Server – How to Ensure a SQL Script Runs with ANSI Encoding

encodingsql serversql-server-2008-r2unicode

Maybe this is a stupid question or I'm asking it in the wrong way.

How can I be sure that a script ( with thousands of lines ) is running with ANSI encoding?

Let's say we created a script using Notepad++ ( programmers code for SQL and ORACLE at the same time ), and we save it with Encode in ANSI :

enter image description here

OK. Then, we have inside our script a  character. If our client just copies this script into some tool that is using a different encoding ( really, I don't know why, but some users do this, and I know for sure it's not to find errors), this  will be transformed into a: enter image description here, but since our script has thousands of rows, nobody will notice this at the client side.

We know users don't see this, they will not read the entire script to be sure everything is ok. Not even DBAs that are paid to do this, will do this ( I'm a DBA and I for sure read all scripts all the time ).

So, how can I be sure, that when the user press F5, all script is encoded with ANSI, the same way we sent it, without these strange characters? Can we only achieve this with the right collate for the database?

I was trying to think something like this on the first row of the script with a case when asci character = the ascii(character) then ok else ERROR ( using ascii to test ):

Select CHAR(ASCII('ã')) As Teste_CHAR, CHAR(227) as Teste_CHAR_ASCII

Solomon, this is the query:

Select *,
Case
        When Teste_CHAR=Teste_CHAR_ASCII
        Then 'OK'
        Else 'Erro'
END as STATUS_TESTE,
Case
       When Teste_CHAR=Teste_CHAR_ASCII
       Then 'Everything is OK'
       Else 'Script will not run. your encode is different from ours'
END as Mensagem_TESTE
 from (
Select CHAR(ASCII('ã')) As Teste_CHAR, CHAR(227) as Teste_CHAR_ASCII
) A

Best Answer

So, how can I be sure, that when the user press F5, all script is encoded with ANSI, the same way we sent it, without these strange characters?

You can't be sure. This is just the unfortunately complicated nature of text encoding, especially non-Unicode encodings. Everything is just bytes. What we see on the screen is just an interpretation of those bytes. One encoding may certainly display a different "character" than another encoding for the same byte, or byte-sequence (depending on the encoding), but technically, bytes are bytes and all bytes are valid.

In the case of Â, you can't detect a difference since there is no difference. You only see the xC2 in Notepad++ because that is the byte value of that character, yet that, by itself, is not a valid UTF-8 or UTF-16 / UCS-2 byte sequence, so Notepad++ shows you just the byte by itself.

Now, going between one of the Unicode encodings (UTF-8, UTF-16, or UTF-32) and an 8-bit encoding could be detectable by finding a character that is not in any 8-bit code page and comparing it to ? / CHAR(63), and if they match, then you are no longer using a Unicode encoding.

The deficiency here is that 8-bit encodings have no way to indicate what encoding / code page they are. You just have to know. But, the Unicode encodings have the option of placing a few bytes at the beginning of the file to indicate the type of encoding being used. This sequence of bytes is called the Byte Order Mark (BOM) and if in the correct encoding, will be non-visible.

So, your best-bet is to use one of the Unicode encodings and be sure to save files with the Byte Order Mark (BOM) since you usually have the choice to save in the Unicode encoding with or without the BOM. In Notepad++ (which I use) the two UCS-2 options are both BOM-only, but UTF-8 has the choice. If your scripts are currently using ANSI, then in the Encoding menu in Notepad++, select Convert to UTF-8-BOM and then save the file. Then, when copying and pasting into SSMS, all should be fine. And opening that file in most editors will automatically detect that it is encoded as UTF-8 because the BOM is there.

Can we only achieve this with the right collate for the database?

This has nothing to do with SQL Server. This has to do with the client tool and what encoding it is using. SSMS is almost certainly using UTF-16 LE (Little Endian) as that is what Windows / SQL Server / .NET uses.


Regarding the query recently added to the end of the question:

The ã has a value of 0xE3 in ANSI encoding, which isn't valid in UTF-8 or UTF-16. In Notepad++, changing the encoding to be UTF-8 (using "Encode in", not "Convert to") results in it showing just the xE3. Copying and pasting that version of the query into SSMS takes that byte plus the next one (the one used for the closing ') and converts it to which breaks the query due to no closing quote. You can fix that by adding 2 spaces after the ã so that it looks like:

CHAR(ASCII('ã  '))

That will still work as expected when the encoding has not been changed since the ASCII function only returns the value for the first character and the additional characters (the 2 spaces) are ignored.

If that script is imported or changed to be UTF-8, it will appear as follows in Notepad++:

CHAR(ASCII('xE3  '))

and that xE3 will be a single "character". Copying and pasting that version of the query into SSMS will show up as follows:

CHAR(ASCII('㠠'))

And running that will produced the desired "error" result.

HOWEVER, please be warned that this is not a fool-proof / guaranteed test. It mainly just indicates that the script was incorrectly opened as being UTF-8, UTF-16, or an 8-bit code page that does not contain the ã character.

This approach does not indicate an error if the script is opened as an 8-bit encoding that is not ANSI but still contains the ã character, and could be misinterpreting (i.e. changing) other characters.

The only way to guarantee ANSI encoding is to find a character that is a) not available in any other 8-bit code page, and b) not the same in UTF-8 or UTF-16. I am not aware of any such character, though I also have not checked them all against all available code pages.

BUT, if you are only dealing with people opening the files as UTF-8, then making the adjustment shown above should work for that situation.