I am loading a table with the Japanese names from CSV file that are changing into question marks into SQL Table. The table that is storing these values is a varchar
column. I understand that the varchar
column is not Unicode and that that's the reason it is changing some of the characters to ??
.
However, how come existing value written in Japanese is stored in varchar
while ideally it should be in nvarchar
?
Is there a way to convert nvarchcar
to varchar
?
The database is out of our control and we cannot change the schema.
Best Answer
As Tibor mentioned, the
Japanese_Unicode_CI_AS
collation (allJapanese_*
collations, actually) can store Japanese characters inVARCHAR
columns due to Windows code page 932 being a Double-Byte character set (DBCS). I believe there are just under 7800 Japanese characters mapped to the Windows-932 code page. However, Unicode contains more than 7800 Japanese characters.It would be very helpful (maybe even necessary) to know some things in order to narrow down the problem:
BULK INSERT
/OPENROWSET(BULK...)
???
??
", so:?
), two (??
), or some of each?Without knowing the answers to those questions, I can say that there are two main possibilities:
(all Japanese characters imported as
?
): You aren't telling the import tool what the encoding of the CSV file is. Is it encoded as Windows-932 (or possibly Windows-31J)? Or is it a Unicode encoding such as UTF-8 or UTF-16 (which might be listed as "UCS-2" or "Unicode" depending on the tool)? If you are using BCP, you would need to use the command-line options of-c -C 932
for Windows-932, or-c -C 65001
for UTF-8. This problem should be fixable simply by setting the tool to use the correct code page.(some Japanese characters imported as
?
or??
): If you are telling the import tool the correct encoding of the file, there are still Japanese characters that are not encoded in the Windows-932 code page. For example:returns:
This problem can only be fixed by doing one of the following:
NVARCHAR
(though you said that you can't change the schema)_UTF8
; introduced in SQL Server 2019)For info on working with collations / Unicode / encodings, please visit: Collations Info