Command Line – Character Encodings Supported by more, cat, and less

character encodingcommand linelessmoreterminal

I have a text file encoded as following according to file:

ISO-8859 text, with CRLF line terminators

This file contains French's text with accents. My shell is able to display accent and emacs in console mode is capable of correctly displaying these accents.

My problem is that more, cat and less tools don't display this file correctly. I guess that it means that these tools don't support this characters encoding set. Is this true? What are the characters encodings supported by these tools?

Best Answer

Your shell can display accents etc because it is probably using UTF-8. Since the file in question is a different encoding, less more and cat are trying to read it as UTF and fail. You can check your current encoding with

echo $LANG

You have two choices, you can either change your default encoding, or change the file to UTF-8. To change your encoding, open a terminal and type

export LANG="fr_FR.ISO-8859"

For example:

$ echo $LANG 
en_US.UTF-8
$ cat foo.txt 
J'ai mal � la t�te, c'est chiant!
$ export LANG="fr_FR.ISO-8859"
$ xterm <-- open a new terminal 
$ cat foo.txt 
J'ai mal à la tête, c'est chiant!

If you are using gnome-terminal or similar, you may need to activate the encoding, for example for terminator right click and:

enter image description here

For gnome-terminal :

enter image description here

Your other (better) option is to change the file's encoding:

$ cat foo.txt 
J'ai mal � la t�te, c'est chiant!
$ iconv -f ISO-8859-1 -t UTF-8  foo.txt > bar.txt
$ cat bar.txt 
J'ai mal à la tête, c'est chiant!
Related Question