Bash Unicode Characters – Why Some Won’t Print to Terminal

bashfontsprintfunicode

I'm running Arch Linux with simple terminal using the Adobe Source Code Pro font. My locale is correctly set to LANG=en_US.UTF-8.

I want to print Unicode characters representing playing cards to my terminal. I'm using Wikipedia for reference.

The Unicode characters for card suits work fine. For example, issuing

$ printf "\u2660"

prints a black heart to the screen.

However, I'm having trouble with specific playing cards. Issuing

$ printf "\u1F0A1"

prints the symbol Ἂ1 instead of the ace of spades ?. What's going wrong?

This problem persists across several terminals (urxvt, xterm, termite) and every font I've tried (DejaVu, Inconsolata).

Best Answer

help printf defers to printf(1) for the escape sequences interpreted, and the docs for GNU printf says:

printf interprets two character syntaxes introduced in ISO C 99: \u for 16-bit Unicode (ISO/IEC 10646) characters, specified as four hexadecimal digits hhhh, and \U for 32-bit Unicode characters, specified as eight hexadecimal digits hhhhhhhh. printf outputs the Unicode characters according to the LC_CTYPE locale. Unicode characters in the ranges U+0000…U+009F, U+D800…U+DFFF cannot be specified by this syntax, except for U+0024 ($), U+0040 (@), and U+0060 (`).

Something similar is specified in the Bash manual for ANSI C Quoting and echo:

\uHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)

\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)

In short: \u is not for 5 hex digits. It's \U:

# printf "\u2660 \u1F0A1 \U1F0A1\n"
♠ Ἂ1 ?
Related Question