How are character encodings related to fonts

character encodingfonts

I mean, does a font have to support every character encoding? Or does a character encoding have to support every font?

What do Unicode fonts mean? are they fonts that support only Unicode, and they dont support, say, windows-1252?

Best Answer

To start with basics, everything is based on US-ASCII which is an 7 bit code with 128 code points in the set, numbered hex 00 through 7F or decimal 0-127. This is mapped to control codes, English alphanumeric, and basic punctuation characters

Adding 1 bit to this for an 8 bit code (byte) gives us another 128 code points or Extended ASCII.

Character sets/code pages were required early on to change how the code points in the upper 128 bits mapped to characters to cover the alphabet for the particular language you wished to represent. This works reasonably well for most western European languages. ISO 8859-1/Latin-1 is an example of such a character set. Another is Windows-1252 which has changes from ISO 8859-1 to help it cover more or different characters.

Languages with more complex character sets like Chinese, Japanese, and Korean exceed the capabilities of the 256 code point set and use a double-byte code to enable their representation.

Unicode UTF-8 is a multi-byte character encoding scheme (1-4 bytes) with backward compatibility to ISO 8859-1/Latin-1 being its first 128 characters. It has room for over 1 million code points which means that each code point can actually represent a character, unlike the mucking around done with Extended ASCII which means that a code point maps to a different character, depending on the character set/code page/encoding.

Fonts are glyphs that are mapped to code points and visually represent characters. The contents of a font are dependent on what languages it was originally meant to cover. You can use Character Map to see what glyphs are contained within the font.

Unicode fonts don't necessarily cover all the code points, you need to see where they were intended to be used. For example, in Windows 7, fire up Character Map and view the characters in Calibri and then compare them to Ebrima, Meiryo and Raavi. Note that they are vastly different because each one is tailored to a different geographic region.

As to Unicode fonts and the Windows-1252 character set, Windows uses a mapping table to translate Windows-1252 to Unicode where it doesn't match ISO 8859-1 for a "Best Fit" scenario where some characters in the Windows-1252 character set may not display.

Related Question