The output of locale
seems to distinguish between upper and lowercase:
% locale -a
C
en_AU.utf8
en_US.utf8
POSIX
More commonly, I've seen the hyphenated and uppercase UTF-8
.
What is the canonical name for utf8 / UTF-8?
localeunicode
The output of locale
seems to distinguish between upper and lowercase:
% locale -a
C
en_AU.utf8
en_US.utf8
POSIX
More commonly, I've seen the hyphenated and uppercase UTF-8
.
What is the canonical name for utf8 / UTF-8?
Best Answer
TL;DR: Nope.
utf8
doesn't refer to an IANA character set since it drops the-
character.UTF-8
utf-8
uTf-8
(Note all have a hyphen)csUTF8
The details
POSIX.1-2017, section 8.2 Internationalization Variables
But while POSIX.1 leaves the details implementation defined, IANA has something to say about it.
RFC2978 IANA Charset Registration Procedures
2.3. Naming Requirements defines a character set primary name:
Note the
Case insensitive ASCII Letter
.Interestingly, this means that
^-^
is a happy but valid character set name.IANA Character Sets
IANA lists the character set as
UTF-8
.While
utf-8
(oruTf-8
) is an official name for an IANA character set name,utf8
(sans hyphen) is not a IANA character set name.Note that there is also a !case-sensitive! alias for the name UTF-8, namely:
csUTF8
.If it's not IANA, where does
utf8
likely come from?glibc's
_nl_normalize_codeset()
does the following:Only passes characters or a digits (goodbye hyphen)
Converts characters to lowercase
The code comment incorrectly says:
This comment doesn't seem cognisant of RFC2978 IANA Charset Registration Procedures, 2.3. Naming Requirements.