I am exporting an excel file (Excel 2016) containing Japanese characters into CSV. (Note : I am not exporting to CSV UTF-8 provided). In the process, all Japanese characters are replaced with '?'
My Windows/Office locale is Japan/Japanese & Windows/office language/format is all Japanese.
I understand that excel uses a codepage to save the CSV file in particular encoding. My understanding was this should be Shift-JIS (as default encoding for Japanese locale). If that is so, why the loss of information & replacement by '?'
What encoding does Excel try to save the CSV in???
(FYI : If I try to open an CSV, excel by default attempts to open the CSV in Shift-JIS 932 as expected)
Note : I am aware of workarounds of using UTF-8. I am interested in understanding above behavior, more than a workaround
Thanks
Best Answer
Excel handles CSV encodings badly, and always did.
Exporting a document as Comma Separated CSV does not use your locale’s codepage but saves the characters as ASCII. Characters that cannot be represented that way are exported as question-marks. Only characters in the ASCII range of 0 to 127 are guaranteed to be exported correctly.
The reason for that is maybe that this code in Excel was written even before Windows supported Unicode, but this is just a guess. Office is full of such patch-works, and one needs to use what works.