Sql-server – Why does the varchar datatype allow unicode values

character-setdatatypesencodingsql serverunicode

I have a table with a varchar column. It is allowing Trademark(™), copyright(©) and other Unicode characters as shown below.

Create table VarcharUnicodeCheck
(
col1 varchar(100)
)

insert into VarcharUnicodeCheck (col1) values ('MyCompany')
insert into VarcharUnicodeCheck (col1) values ('MyCompany™')
insert into VarcharUnicodeCheck (col1) values ('MyCompany░')
insert into VarcharUnicodeCheck (col1) values ('MyCompanyï')
insert into VarcharUnicodeCheck (col1) values ('MyCompany')

select * from VarcharUnicodeCheck

But the definition of varchar says, it allows non-unicode string data. But the Trademark(™) and Registered(®) symbols are Unicode characters. Does the definition contradicts the property of varchar datatype? I read couple of links like first one and second one. But still I could not understand why it allows unicode string when the definition says that it allows only non-unicode string values.

Best Answer

But the Trademark(™) and Registered(®) symbols are Unicode characters.

Your are wrong here. Your strings contain only ascii characters.

Here is a simple test that shows you that your characters are all ascii (+ some extended ascii with ascii codes between 128 and 255):

declare @VarcharUnicodeCheck table
(
col1 varchar(100)
)

insert into @VarcharUnicodeCheck (col1) values ('MyCompany')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany™')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany░')
insert into @VarcharUnicodeCheck (col1) values ('MyCompanyï')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany')

select *,
        right(col1, 1)as last_char, 
        ascii(right(col1, 1)) as_last_char_ascii
from @VarcharUnicodeCheck;

Here you can clearly see that all your characters are 1-byte encoded:

enter image description here

Yes they are not pure ascii characters but they are Extended ASCII.

Here I show you real unicode character Trademark(™) and its code and binary representation:

declare @t table (uni_ch nchar(1), ascii_ch char(1));
insert into @t values (N'™', '™');

select unicode(uni_ch) as [unicode of ™], 
       ascii(ascii_ch) [ascii of ™], 
       cast(uni_ch as varbinary(10)) as [uni_ch as varbinary], 
       cast(ascii_ch as varbinary(10)) as [ascii_ch as varbinary]
from @t;

enter image description here

Finally, you can see that Trademark(™) unicode character has 8482 code and not 153:

select nchar(8482), nchar(153)