I've been trying to find a reasonable solution/explanation (unsuccessfully) to find out why Excel defaults to removing the BOM when saving a file to the CSV type.
Please forgive me if you find this a duplicate of this question. This handles reading CSV files with non-ASCII encoding, but it doesn't cover saving the file back out (which is where the biggest issue lies).
Here is my current situation (which I'm going to gather is common among localized software dealing with Unicode characters and a CSV format):
-
We export data to a CSV format using UTF-16LE, ensuring the BOM is set (0xFFFE). We validate after the file is generated with a Hex editor to ensure it was set correctly.
-
Open the file in Excel (for this example we're exporting Japanese characters) and witness that Excel handles loading the file with the correct encoding.
-
Attempts to save this file will prompt you with a warning message indicating that the file may contain features that may not be compatible with Unicode encoding, but asks if you'd like to save anyway.
-
If you select the Save As dialog, it will immediately ask you to save the file as "Unicode Text" rather than CSV. If you select the "CSV" extension and save the file it removes the BOM (obviously along with all the Japanese characters).
Why would this happen? Is there a solution to this problem, or is this a known 'bug'/limitation of Excel?
Additionally (as a side issue) it appears that Excel, when loading UTF-16LE encoded CSV files, only uses TAB delimiters. Again, is this another known 'bug'/limitation of Excel?
Best Answer
There are no polite answers to questions of the form "Why does Excel do X?".
Why don't you just adopt "Save as Unicode Text" as your standard transfer format? Excel Save as ensures there's a BOM, uses TAB as a delimiter, quotes fields just like CSV ...
By the way, you don't "select the CSV extension". You have TWO choices -- format and extension. Try save as unicode text in the 'Save as type' box and manually change the extension in the 'File name' box to csv. Then exit from Excel and double-click on the file name in Windows Explorer. Opens up fine. Try it.
Update to discuss OP's comments:
=> Please read carefully what I wrote above and try it out for your self. It will always save your file with a TXT extension unless you override that: delete the
txt
in the 'File name' box and typecsv
.=> "Save as Unicode text" ensures that there is a UTF-16LE BOM at the front of the file. Try it and see.
=> and your expectations are not met, because Excel doesn't retain any info about input CSV files. There is no way to force it to do what you want to do. So you either give up or try something else, like what I'm suggesting.
=> I have experienced these problems, with Chinese data and Excel 1997, years ago. I have, before answering your question, verified with Excel 2003 and Excel 2007 that Excel is still behaving badly. I have verified that the "save as type = Unicode text, extension = CSV" workaround does indeed work. I have provided my insights.