Cygwin or GnuWin32 provide Unix tools like iconv
and dos2unix
(and unix2dos
). Under Unix/Linux/Cygwin, you'll want to use "windows-1252" as the encoding instead of ANSI (see below). (Unless you know your system is using a codepage other than 1252 as its default codepage, in which case you'll need to tell iconv the right codepage to translate from.)
Convert from one (-f
) to the other (-t
) with:
$ iconv -f windows-1252 -t utf-8 infile > outfile
Or in a find-all-and-conquer form:
## this will clobber the original files!
$ find . -name '*.txt' -exec iconv --verbose -f windows-1252 -t utf-8 {} \> {} \;
Alternatively:
## this will clobber the original files!
$ find . -name '*.txt' -exec iconv --verbose -f windows-1252 -t utf-8 -o {} {} \;
This question has been asked many times on this site, so here's some additional information about "ANSI". In an answer to a related question, CesarB mentions:
There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.
The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).
The page he links to gives this historical tidbit (quoted from a Microsoft PDF) on the origins of CP 1252 and ISO-8859-1, another oft-used encoding:
[...] this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1. However, in adding code points to the range reserved for control codes in the ISO standard, the Windows code page 1252 and subsequent Windows code pages originally based on the ISO 8859-x series deviated from ISO. To this day, it is not uncommon to have the development community, both within and outside of Microsoft, confuse the 8859-1 code page with Windows 1252, as well as see "ANSI" or "A" used to signify Windows code page support.
Best Answer
What is the difference between Windows-1252 and ANSI encoding?
See below. In practice it probably won't make much difference to your conversion.
If you keep a copy of the original file then you can always apply a different conversion if necessary.
Having said that there are ways of converting UTF-8 to ANSI.
Windows-1252
...
Source Windows-1252
Note that is spite of the above statement by Microsoft they still call Windows 1252 "ANSI":
Source Code Page 1252 Windows Latin 1 (ANSI)