MacOS – What the difference and usage of encodings UTF-8 and UTF-8-MAC in iconv

macosunicodeutf-8

What the difference and usage of encodings UTF-8 and UTF-8-MAC in iconv?
I thought it was the difference between \n and \r(MAC OS 9) at first.
But I tried iconv -f UTF-8 -t UTF-8-MAC filename > filename2
The file content doesn't change in hex view.

Best Answer

As explained here, utf-8-mac is the utf 8 version of a text after application of Unicode normalization NFD (e.g accented characters are represented by the base character plus a combining accent character), with certain codepoint ranges excluded from the decomposition operation.

For example character é can be represented in two different equally valid ways in Unicode:

  • "\x{00E9}" - single codepoint, LATIN SMALL LETTER E WITH ACUTE, utf-8 C3 A9, "composed".
  • "\x{0065}\x{0301}" - two codepoints, LATIN SMALL LETTER E and COMBINING ACUTE ACCENT, utf-8 65 CC 81, "decomposed"

UTF-8-MAC will ensure that the second, decomposed form is always used.