How to find out which unicode codepoints are defined in a TTF file

.ttffontsunicode

I need to automate a process of verification which Unicode characters have actual glyphs defined for them in a True Type Font file. How do I go around doing that? I can't seem to find information on how to make sense of the numbers I seem to be getting when I open a .ttf file in a text editor.

Best Answer

I found a python library, fonttools (pypi) that can be used to do it with a bit of python scripting.

Here is a simple script that lists all fonts that have specified glyph:

#!/usr/bin/env python3

from fontTools.ttLib import TTFont
import sys

char = int(sys.argv[1], base=0)

print("Looking for U+%X (%c)" % (char, chr(char)))

for arg in sys.argv[2:]:
    try:
        font = TTFont(arg)

        for cmap in font['cmap'].tables:
            if cmap.isUnicode():
                if char in cmap.cmap:
                    print("Found in", arg)
                    break
    except Exception as e:
        print("Failed to read", arg)
        print(e)

First argument is codepoint (decimal or hexa with 0x) and the rest is font files to look in.

I didn't bother trying to make it work for .ttc files (it requires some extra parameter somewhere).

Note: I first tried the otfinfo tool, but I only got basic multilingual plane characters (<= U+FFFF). The python script finds extended plane characters OK.

To keep the whole history of this answer.

If you give the command

fc-list -v

it should list for every font the charset property which is a bitmask of which character codes exist in the font. For example, for a simple font like fc-list -v 'Courier 10 Pitch' it has lines:

charset: 
0000: 00000000 ffffffff ffffffff 7fffffff 00000000 ffffffff ffffffff ffffffff
0001: 00000000 00020000 000c0006 61000003 00040000 00000000 00000000 00000000
...
00fb: 00000006 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Take the hex number in the first column, like the last line 00fb, and shift it left by 8 bits. It is the start of the unicode value. The bitmask 00000006 says a glyph exists for codes 1 and 2 (6 = 2+4 = 1<<1 | 1<<2) which you add to the first column to get 00fb01 and 00fb02. (These glyphs are, for example, latin small ligature fi.)

So in the case of U1F32D you need to grep for 01f3: and look for a bit set at index 2d in the line, i.e. 00000000 00002000 ... (probably!). Note the 0's here just show the position of the 2. The actual values may be any hex digit. (A grep pattern might be 01f3: ........ ....[2367abef]). The preceding file: entry should lead you to the package (use rpm -qf filename).

But I'm sure there must be a better way to search for a glyph.

Best Answer

Related Solutions

How to view a TTF font file

How to find which font provides a particular Unicode glyph

To keep the whole history of this answer.

Related Question