Centos – the proper encoding name to use in locale for UTF-8

centosgdmlocalex11

Just wondering here as I have on this CentOS 7 system:

$ locale -a 
<snip>
en_US.utf8
<snip>

and yet:

$ localectl 
System Locale: LANG=en_US.UTF-8

To add to that, the preferred name according to X11 (/usr/share/X11/locale/locale.dir) is:

$ grep 'en_US.UTF-8$' /usr/share/X11/locale/locale.dir 
en_US.UTF-8/XLC_LOCALE                  en_US.UTF-8
en_US.UTF-8/XLC_LOCALE:                 en_US.UTF-8

Luckily for en_US.utf8, there is an alias:

$ grep 'en_US.utf8' /usr/share/X11/locale/locale.alias
en_US.utf8                                      en_US.UTF-8
en_US.utf8:                                     en_US.UTF-8

Some others aren't so lucky e.g. ru_UA.utf8:

$ locale -a | grep ru_UA.utf8
ru_UA.utf8
$ grep 'ru_UA.utf8' /usr/share/X11/locale/locale.alias
$ grep 'ru_UA.UTF-8' /usr/share/X11/locale/locale.dir
en_US.UTF-8/XLC_LOCALE                  ru_UA.UTF-8
en_US.UTF-8/XLC_LOCALE:                 ru_UA.UTF-8

The reason this is somewhat annoying if the selected locale is not in the X11 locale.alias is that GDM (or gnome-session?) forces the use of the "utf8" version, breaking X programs with messages like: "Warning: locale not supported by Xlib, locale set to C". I could just edit /usr/share/X11/locale/locale.alias, but it would be nice to have more info on which version is actually right.

Best Answer

Comments in GNU libc sources (intl/l10nflist.c:_nl_normalize_codeset) state:

There is no standard for the codeset names.

Codeset names are normalized by that function to all-lowercase with all non-alphanumeric characters stripped i.e. "UTF-8" turns into "utf8".

The locale names inside the locale archive are using normalized codeset names.

Since there is no standard, GDM is well within its rights to use "utf8" and locales like 'ru_UA.utf8' are not invalid. "utf8" may not be preferred, but it is definitely acceptable (at least by libc standards) as it is the normalized form.

Related Question