Command Line – How to Remove BOM from a UTF-8 File?

command linefilesunicode

I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?

$ file test.xml
test.xml:  XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines

Best Answer

If you're not sure if the file contains a UTF-8 BOM, then this (assuming the GNU implementation of sed) will remove the BOM if it exists, or make no changes if it doesn't.

sed '1s/^\xEF\xBB\xBF//' < orig.txt > new.txt

You can also overwrite the existing file with the -i option:

sed -i '1s/^\xEF\xBB\xBF//' orig.txt

If you are using the BSD version of sed (eg macOS) then you need to have bash do the escaping:

 sed $'1s/\xef\xbb\xbf//' < orig.txt > new.txt