Effect of $LANG on terminal

character encodinggnome-terminallocaleterminalterminal-emulator

I'm trying to learn how the $LANG variable behaves with gnome-terminal (and its character encoding preference option). I've been using iso8859-1 (latin1) as my main character-set and all my filenames are encoded as such.

For the following tests I'll do an ls -l of a directory with Spanish accented characters in their filenames:

Case #1:

  • gnome-terminal configured for ISO-8859-1
  • LANG set to "en_US-iso8859-1"
  • Result: I see all files correctly

Case #2:

  • gnome-terminal configured for UTF-8
  • LANG set to "en_US-iso8859-1"
  • Result: I see garbage characters for all spanish characters. This is expected as I changed the character-encoding for the terminal

Case #3:

  • gnome-terminal configured for ISO-8859-1
  • LANG set to "en_US-UTF-8"
  • Result: I see garbage characters for all spanish characters.

Why is that in this last case I see garbled characters? Shouldn't the output of ls send the filenames straight to gnome-terminal as they are? And since gnome-terminal is configured for ISO-8859-1, I would have expected them to look right.

For a moment I thought that, perhaps, maybe bash is considering my $LANG variable and performing some conversion. Then I switched my terminal to UTF-8 but I still can't see the characters right. I even piped the output of ls to xxd and to my surprise I still see the files encoded as they are: ISO-8859-1.

To wrap up: If my listing contains ISO-8859-1 characters and my terminal emulator is configured for the same character-encoding: Who's doing the conversion when LANG is set otherwise?

Thanks for any help you can provide.

Craconia

Best Answer

Your setting for LANG must match the terminal's. More precisely, your setting for LC_CTYPE (the character encoding) must match the terminal's encoding, the other locale settings don't need to match. And the terminal's encoding is usually specified by an option of the terminal emulator and not by a locale variable. The LC_CTYPE combines two indications: it tells applications what encoding to use on the terminal (both for input and output), and it tells applications what encoding to use with files. In cases 2 and 3, you've told ls to display output in an encoding that's different from the terminal's, so the output is garbled.

If you work with both UTF-8 and latin-1 encodings at different times, configure your terminal to use UTF-8. This should cause it to set LC_CTYPE to a value indicating UTF-8; don't override this setting. (If the terminal emulator doesn't set LC_CTYPE, do override it in your shell startup file or for your whole session.) To work with latin-1 data in an UTF-8 terminal, use luit (included in the X utility suite).

LC_CTYPE=en_US.iso88591 luit

(You can use any other locale with the same encoding, e.g. LC_CTYPE=es_ES.iso88591 luit.)

Related Question