Exporting from Excel to CSV replaces Japanese characters with ??? even though Windows, Office locale is Japan/Japanese

character encodingcsvencodingjapanesemicrosoft excel

I am exporting an excel file (Excel 2016) containing Japanese characters into CSV. (Note : I am not exporting to CSV UTF-8 provided). In the process, all Japanese characters are replaced with '?'

My Windows/Office locale is Japan/Japanese & Windows/office language/format is all Japanese.

I understand that excel uses a codepage to save the CSV file in particular encoding. My understanding was this should be Shift-JIS (as default encoding for Japanese locale). If that is so, why the loss of information & replacement by '?'

What encoding does Excel try to save the CSV in???

(FYI : If I try to open an CSV, excel by default attempts to open the CSV in Shift-JIS 932 as expected)

Note : I am aware of workarounds of using UTF-8. I am interested in understanding above behavior, more than a workaround

Thanks

Best Answer

Excel handles CSV encodings badly, and always did.

Exporting a document as Comma Separated CSV does not use your locale’s codepage but saves the characters as ASCII. Characters that cannot be represented that way are exported as question-marks. Only characters in the ASCII range of 0 to 127 are guaranteed to be exported correctly.

The reason for that is maybe that this code in Excel was written even before Windows supported Unicode, but this is just a guess. Office is full of such patch-works, and one needs to use what works.

Related Question