Your shell can display accents etc because it is probably using UTF-8. Since the file in question is a different encoding, less
more
and cat
are trying to read it as UTF and fail. You can check your current encoding with
echo $LANG
You have two choices, you can either change your default encoding, or change the file to UTF-8. To change your encoding, open a terminal and type
export LANG="fr_FR.ISO-8859"
For example:
$ echo $LANG
en_US.UTF-8
$ cat foo.txt
J'ai mal � la t�te, c'est chiant!
$ export LANG="fr_FR.ISO-8859"
$ xterm <-- open a new terminal
$ cat foo.txt
J'ai mal à la tête, c'est chiant!
If you are using gnome-terminal
or similar, you may need to activate the encoding, for example for terminator
right click and:
For gnome-terminal
:
Your other (better) option is to change the file's encoding:
$ cat foo.txt
J'ai mal � la t�te, c'est chiant!
$ iconv -f ISO-8859-1 -t UTF-8 foo.txt > bar.txt
$ cat bar.txt
J'ai mal à la tête, c'est chiant!
I'd refine your script to:
set -o noclobber
for f in ./*.csv
do
if [ "$(file -b --mime-encoding "$f")" = utf-16le ]; then
iconv -f UTF-16 -t UTF-8 "$f" > "$f"-new &&
mv "$f"-new "$f"
fi
done
Best Answer
There is no specific character encoding mandated by POSIX. The only character in a fixed position is null, which must be 00.
What POSIX does require is that all characters from its Portable Character Set exist. The Portable Character Set contains the printable ASCII characters, space, BEL, backspace, tab, carriage return, newline, vertical tab, form feed, and null. Where or how those are encoded is not specified, except that:
It imposes no other restrictions on the representation of characters, so a conforming system is free to support encodings with any representation of those characters, and any other characters in addition.
Different locales on the same system can have different representations of those characters, with the exception of
.
and/
, andThe only files that all POSIX-compliant systems are required to treat in the same way are files consisting entirely of null bytes. Files treated as text have their lines terminated by the encoding's representation of the PCS's newline character.