Bash – How to Convert Unicode Codepoint to Printable Character

bashunicode

I have a list of Unicode codepoints, but I don't know of a "simple" way to convert these hex values into the actual characters they represent…

I've heard that zsh has echo -e '\u0965', but I use bash 4.1.

Is there something as simple as the zsh method, for bash?

Best Answer

You can use bash's echo or /bin/echo from GNU coreutils in combination with iconv:

echo -ne '\x09\x65' | iconv -f utf-16be

By default iconv converts to your locales encoding. Perhaps more portable than relying on a specific shell or echo command is Perl. Most any UNIX system I am aware of while have Perl available and it even have several Windows ports.

perl -C -e 'print chr 0x0965'

Most of the time when I need to do this, I'm in an editor like Vim/GVim which has built-in support. While in insert mode, hit Ctrl-V followed by u, then type four hex characters. If you want a character beyond U+FFFF, use a capital U and type 8 hex characters. Vim also supports custom easy to make keymaps. It converts a series of characters to another symbol. For example, I have a keymap I developed called www, it converts TM to ™, (C) to ©, (R) to ®, and so on. I also have a keymap for Klingon for when that becomes necessary. I'm sure Emacs has something similar. If you are in a GTK+ app which includes GVim and GNOME Terminal, you can try Control-Shift-u followed by 4 hex characters to create a Unicode character. I'm sure KDE/Qt has something similar.

UPDATE: As of Bash 4.2, it seems to be a built in feature now:

echo $'\u0965'

UPDATE: Also, nowadays a Python example would probably be preferred to Perl. This works in both Python 2 and 3:

python -c 'print(u"\u0965")'
Related Question