Linux – UTF-8 locale portability (and ssh)

bsdlinuxlocaleshellutf-8

I spend a lot of my time sshed into various machines, all of which are different (some are embedded, some run Linux, some run BSD, &c.). On my own local machines, however, i use OS X, which of course has a userland based on BSD. My locale on those machines is set to en_GB.UTF-8, which is one of the available options:

% echo `sw_vers`
ProductName: Mac OS X ProductVersion: 10.8.2 BuildVersion: 12C60
% locale -a | grep -i 'en_gb.utf'
en_GB.UTF-8

Several of the more-capable Linux systems i use appear to have an equivalent option, but i note that on Linux the name is slightly different:

% lsb_release -d
Description: Debian GNU/Linux 6.0.3 (squeeze)
% locale -a | grep -i 'en_gb.utf' 
en_GB.utf8

This makes me wonder: When i ssh into a Linux machine from my Mac, and it forwards all of my LC_* variables with that 'UTF-8' suffix, does that Linux machine even understand what is being asked of it? Or is it just falling back to some other locale?

edit: Here is an example of what i'm referring to:

% ssh -v odin
...
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env LC_ALL = en_GB.UTF-8
debug1: Sending env LC_COLLATE = en_GB.UTF-8
debug1: Sending env LC_CTYPE = en_GB.UTF-8
debug1: Sending env LC_MESSAGES = en_GB.UTF-8
debug1: Sending env LC_MONETARY = en_GB.UTF-8
debug1: Sending env LC_NUMERIC = en_GB.UTF-8
debug1: Sending env LC_TIME = en_GB.UTF-8
debug1: Sending env LANG = en_GB.UTF-8
odin:~ % locale | tail -1  # locale is set to .UTF-8 without error...
LC_ALL=en_GB.UTF-8
odin:~ % locale -a | grep 'en_GB.UTF-8'  # ... even though .UTF-8 isn't an option
odin:~ % 

In either case, what is the mechanism behind its behaviour, and is it dependent on any particular set-up (e.g., will i see the same behaviour on a BusyBox-based system as on a GNU-based one)?

Best Answer

It's an interesting question, but I think there may be a misconception in there about how variables are set up. When a secure shell session is initiated (ssh remotehost), what happens at the other end is an instantiation of a new shell with a separate environment. That is a fancy way of saying that the server starts a fresh shell. That new shell may or may not be configured with the same locale as your original local shell.

E.g

geee: ~
$ echo `locale |grep LANG` :: `date`
LANG=en_US.UTF-8 :: Mon Dec 3 07:04:00 CET 2012

$ ssh flode
flode: ~
$ echo `locale |grep LANG` :: `date`
LANG=nb_NO.UTF-8 LANGUAGE=nb_NO.UTF-8 :: ma. 03. des. 06:59:33 +0100 2012

In order to demonstrate this, I set up the locale on the remote shell for Norwegian by adding the following lines to the ~/.bash_profile file:

export     LANG=nb_NO.UTF-8
export LANGUAGE=nb_NO.UTF-8
export   LC_ALL=nb_NO.UTF-8

Similarly, you will have to set up the environment on the remote shell to do the same. Of course, other shells read different startup files such as ~/.zprofile for the Z shell.

The misconception I suspected lay in that the local variables (settings) are in no way forwarded. The remote shell has its own settings. In order to list the available languages on the remote host, be it a minimalistic BusyBox shell or a full-blown GNU OS, use the locale command with the -a switch (as noted in the question). Any of the printed lines may be used as a locale setting for that environment.

As for the first question, the default locale that any shell starts with is usually configured in a central place such as /etc/profile. Most login shells read this file on startup.

Related Question