Ubuntu – Working with text files encoded as Windows-1250 and UTF-8

bashcommand lineencodingwindows

I am switching between Ubuntu and Windows frequently therefore I have some encoding problems with text files.

If I save text file in Ubuntu, everything works fine in both systems.
But Ubuntu doesn't discover encoding of files saved with Windows. Every time I open the "windows file" in Ubuntu text editor I have to change encoding options.

Solution is changing encoding from Windows-1250 to utf-8.

So the question is how to open each file with Windows-1250 and save it with utf-8, for every file in sub-directories of current directory (recursively I mean). Can I do it in terminal or I need some external application.

I'm looking forward for your help.

Best Answer

I prefer to use recode for this. It's not installed by default, but available through the package by the same name. It also changes CRLF line endings to LF.

sudo apt-get install recode
recode cp1250.. file.txt

You can do this for all txt files in an entire dir

recode cp1250.. ./*.txt

And recursively by combining with find

find . -type f -name "*.txt" -exec recode cp1250.. {} +

The standard option is to use the iconv command, which is installed by default, but this does not change the line endings, so you need to do that in another step:

iconv -f cp1250 < file.txt | sed $'s/\r$//' > newfile.txt

In the long run, I'd recommend changing your windows editor's default character set and line ending to UTF-8 and UNIX line endings (LF,\n) to avoid having to do the conversion after the fact.


CR means Carriage Return (\r)
LF means Line feed (\n)

Windows uses both, but unix-like systems uses only LF.