As I understand, locale-gen
utility generates /usr/lib/locale/locale-archive
database based on entries in /etc/locale.gen
file and locale template/configuration files in /usr/share/i18n/locales/
. In addition, utilities store their translation files in Machine Object format under /usr/share/locale/<locale_dir>/LC_MESSAGES/
directory. For example:
# dpkg -L wget | grep nl
/usr/share/locale/nl
/usr/share/locale/nl/LC_MESSAGES
/usr/share/locale/nl/LC_MESSAGES/wget.mo
#
When I execute for example strace -e open wget
, then I can see that both /usr/lib/locale/locale-archive
and /usr/share/locale/nl/LC_MESSAGES/wget.mo
files are opened.
What localization data is stored in files in /usr/share/locale/<locale_dir>/LC_MESSAGES/
directory and what localization data is stored in /usr/lib/locale/locale-archive
?
Best Answer
While knowing nearly nothing upfront about how localization is implemented in Linux, I tried my best to get my head around it.
Brief Description
locale-archive
is a memory-mapped file which is generated bylocale-gen(8)
invokinglocaledef(1)
. Memory-mapped means that once it is created and called by a program it is only loaded once into memory.Since all language sets defined in
/etc/locale.gen
are predefined and the archive itself is highly static, there is no need in having it multiple times in memory. Thus, everytime it is called by another program, the process gets pointed to the archive already loaded in memory, therefore only adding up to the programs virtual memory. This way not only the physical memory footprint of the process is lowered, but also every syscal concerning localization is sped up. (no additional disk I/O needed!)Also, it seems to work as a sort of failback locale file containing all system wide languages. In addition, the archive is heavily used by software written with glibc.
Internationalization (
i18n, 18 chars between 'i' and 'n'
) of software in Linux can be achieved by using GNU-gettext.gettext()
function wrapping the string wich needs to be printed.xgettext(1)
iterates over the source, creating.pot (Portable Object Template Files)
on its way.msginit(1)
to parse it into.po (Portable Object)
files, generally representing a message catalog. Then all strings get translated by hand.msgfmt(1)
is used to compile the edited.po
file into binary.mo (Message Object)
files. These can be shipped along with the software package.When installing a package on a system,
/usr/share/locale/<locale_dir>/LC_MESSAGES/
gets populated with$PROGRAM.mo
files. When e.g. invoking wget, yourLANG
env-variable will pointwget
to use your current locale-setting, which results inwget
including the right precompiled translations via pointers into the read.mo binary
.Additional Details and Sources
For locale-archive:
Memory-mapping: CentOS-Mailing-List
Methods I18N Subpackaging: Fedora Documentation on different locale-archive compilations
Also consider manpages for
locale(1), localedef(1)
andlocale-gen(8)
.For
.mo
files:Process of creating
.mo
files: Wikipedia on GettextGNU MO File Format: explanation and binary format
Also consider manpages for
xgettext(1), msginit(1)
andmsgfmt(1)
.Also take a look at the ENV Variables
LC_MESSAGE
andLOCPATH
.I am sure this only scratched the surface of this vast topic. Nevertheless I hope this is enough to get you started.