Notepad++ save as UTF-16 file without byte order mark

bomencodingnotepadunicode

Is there any way to save a file in Notepad++ using the UTF-16 encoding (little endian), but without adding the byte order mark? For example, if a text file is saved in notepad++ using the little endian UTF-16 encoding (Encoding > UCS-2 LE BOM), it will have the bytes FF FE prepended to it, which I would like to remove without having to manually do so.

If there isn't a way to do this by default, is there a way I can create an encoding for Notepad++ which is the same as the UCS-2 LE BOM option, just without the byte order mark?

Best Answer

First, Notepad++ doesn't even support UTF-16. It's (as it says) UCS-2. But while UTF-16 is backwards compatible to UCS-2, these two are not the same. UCS-2 always saves characters (CodePoints) within 2 bytes. But UTF-16, as a successor of UCS-2, introduced the so-called Surrogate Pairs to allow more than twice the amount of bits per character.

A good way to visualize this is to create a file with a character outside the range of a UCS-2 file. Try any emoji (like ?), for example. Paste this into a text file and save it as UCS-2 file with Notepad++. Then re-open it. The character will not show up correctly anymore, since the encoding doesn't support it.

Next, do the same thing in an editor that supports UTF-16, like the Windows Notepad. Unlike Notepad++, if you save it as a UTF-16 file, the character will stay visible after saving and re-opening.

Second, there's not really a good reason to remove the BOM from a text file that's not UTF-8. That's because text editors usually scan a file for the BOM to check the encoding since the encoding isn't explicitly saved within the file. So the text editor has to guess, and the most precise way is by the BOM. The BOM is basically the header of a text file. Removing it is a bad idea.

Related Question