I'm trying to learn how the $LANG
variable behaves with gnome-terminal (and its character encoding preference option). I've been using iso8859-1 (latin1) as my main character-set and all my filenames are encoded as such.
For the following tests I'll do an ls -l
of a directory with Spanish accented characters in their filenames:
Case #1:
- gnome-terminal configured for ISO-8859-1
LANG
set to "en_US-iso8859-1"- Result: I see all files correctly
Case #2:
- gnome-terminal configured for UTF-8
LANG
set to "en_US-iso8859-1"- Result: I see garbage characters for all spanish characters. This is expected as I changed the character-encoding for the terminal
Case #3:
- gnome-terminal configured for ISO-8859-1
LANG
set to "en_US-UTF-8"- Result: I see garbage characters for all spanish characters.
Why is that in this last case I see garbled characters? Shouldn't the output of ls send the filenames straight to gnome-terminal as they are? And since gnome-terminal is configured for ISO-8859-1, I would have expected them to look right.
For a moment I thought that, perhaps, maybe bash is considering my $LANG
variable and performing some conversion. Then I switched my terminal to UTF-8 but I still can't see the characters right. I even piped the output of ls to xxd and to my surprise I still see the files encoded as they are: ISO-8859-1.
To wrap up: If my listing contains ISO-8859-1 characters and my terminal emulator is configured for the same character-encoding: Who's doing the conversion when LANG
is set otherwise?
Thanks for any help you can provide.
Craconia
Best Answer
Your setting for
LANG
must match the terminal's. More precisely, your setting forLC_CTYPE
(the character encoding) must match the terminal's encoding, the other locale settings don't need to match. And the terminal's encoding is usually specified by an option of the terminal emulator and not by a locale variable. TheLC_CTYPE
combines two indications: it tells applications what encoding to use on the terminal (both for input and output), and it tells applications what encoding to use with files. In cases 2 and 3, you've toldls
to display output in an encoding that's different from the terminal's, so the output is garbled.If you work with both UTF-8 and latin-1 encodings at different times, configure your terminal to use UTF-8. This should cause it to set
LC_CTYPE
to a value indicating UTF-8; don't override this setting. (If the terminal emulator doesn't setLC_CTYPE
, do override it in your shell startup file or for your whole session.) To work with latin-1 data in an UTF-8 terminal, useluit
(included in the X utility suite).(You can use any other locale with the same encoding, e.g.
LC_CTYPE=es_ES.iso88591 luit
.)