When I try to save a text file with non-English text in Notepad, I get an option to choose between Unicode, Unicode Big Endian and UTF-8. What is the difference between these formats?
Assuming I do not want any backward compatibility (with older OS versions or apps) and I do not care about the file size, which of these formats is better?
(Assume that the text can be in languages like Chinese or Japanese, in addition to other languages.)
Note: From the answers and comments below it seems that in Notepad lingo, Unicode is UTF-16 (Little Endian), Unicode Big Endian is UTF-16 (Big Endian) and UTF-8 is well UTF-8.
Best Answer
Dunno. Which is better: a saw or a hammer? :-)
Unicode isn't UTF
There's a bit in the article that's a bit more relevant to the subject at hand though:
UTF-32 focuses on exhaustiveness and fixed-length representation, using 4 bytes for all characters. It’s the most straightforward translation, mapping directly the Unicode code-point to 4 bytes. Obviously, it’s not very size-efficient.
UTF-16 is a compromise, using 2 bytes most of the time, but expanding to 2 * 2 bytes per character to represent certain characters, those not included in the Basic Multilingual Plane (BMP).
Also see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)