Unix Command-Line – Convert Between Unicode Normalization Forms

command lineconversiontext processingunicode

In Unicode, some character combinations have more than one representation.

For example, the character ä can be represented as

"ä", that is the codepoint U+00E4 (two bytes c3 a4 in UTF-8 encoding), or as
"ä", that is the two codepoints U+0061 U+0308 (three bytes 61 cc 88 in UTF-8).

According to the Unicode standard, the two representations are equivalent but in different "normalization forms", see UAX #15: Unicode Normalization Forms.

The unix toolbox has all kinds of text transformation tools, sed, tr, iconv, Perl come to mind. How can I do quick and easy NF conversion on the command-line?

Best Answer

You can use the uconv utility from ICU. Normalization is achieved through transliteration (-x).

$ uconv -x any-nfd <<<ä | hd
00000000  61 cc 88 0a                                       |a...|
00000004
$ uconv -x any-nfc <<<ä | hd
00000000  c3 a4 0a                                          |...|
00000003

On Debian, Ubuntu and other derivatives, uconv is in the libicu-dev package. On Fedora, Red Hat and other derivatives, and in BSD ports, it's in the icu package.

Related Solutions

How to do a regex search in a UTF-16LE file while in a UTF-8 locale

My answer is essentially the same as in your other question on this topic:

$ iconv -f UTF-16LE -t UTF-8 myfile.txt | grep pattern

As in the other question, you might need line ending conversion as well, but the point is that you should convert the file to the local encoding so you can use native tools directly.

Convert in command line an sfd file (fontforge) to ttf, otf, woff, svg

You can do it with Fontforge, see here:

-c script-string

If FontForge's first (or second, if the first is -lang) argument is "-c" then the argument that follows will be treated as a string containing scripting commands, and those commands will be executed. All remaining arguments will be passed to the script.
$ fontforge -c 'Open($1); Generate($2)' foo.sfd foo.ttf
Will read a font from "foo.sfd" and then generate a truetype font from it called "foo.ttf"

In your case you can create a script, say convertsfd, like this

#!/bin/bash
fontforge -lang=ff -c 'Open($1); Generate($2)' "$1" "$2"

make it executable, and call it like this:

$ ./convertsfd foo.sfd foo.ttf

Change the second argument to foo.otf or to other formats as needed, I only tested with ttf and otf.

To call the script from anywhere, just place it in your ~/.local/bin, or some other directory in your PATH.

Best Answer

Related Solutions

How to do a regex search in a UTF-16LE file while in a UTF-8 locale

Convert in command line an sfd file (fontforge) to ttf, otf, woff, svg

Related Question