Bash – How to Determine if Current Locale Uses UTF-8 Encoding

bashlocaleunicode

I would like to determine if the user's locale uses UTF-8 encoding.

This seems a little bit ugly:

[[ $LANG =~ UTF-8$ ]] && echo "Uses UTF-8 encoding.."

is there a more general/portable way?

Best Answer

From Wikipedia :

On POSIX platforms, locale identifiers are defined similarly to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character encoding is included as a part of the identifier.

It is defined in this format: [language[_territory][.codeset][@modifier]]. (For example, Australian English using the UTF-8 encoding is en_AU.UTF-8.)

However, if the codeset suffix is missing in the locale identifier, for example as in en_AG (see this question), then the codeset is defined by a default setting for that locale, which could very well be UTF-8. As a result, the current encoding cannot be determined by looking at the LANG environment variable.

Further, the locale command only shows the current values of the environment variables.. so it seems that that command cannot be used to determine the codeset either..

However, there is a Perl module I18N::Langinfo, see also this question that seems to be a solution:

perl -MI18N::Langinfo=langinfo,CODESET -E 'say "Uses UTF-8 encoding .." if langinfo(CODESET()) eq "UTF-8"'

This Perl module is a wrapper for the C library function nl_langinfo.