Get list of all fonts containing a specific character

fontinternationalizationscript

On macOS Sierra, I would like to get a usable list of all fonts that cover (contain a glyph for) a particular Unicode character. Is there a way to do that, either via a tool, or the commandline, or writing a program?

Right now, this is what I can do:

  1. Under System Preferences → Keyboard → Keyboard, there is an option to "Show keyboard and emoji viewers in menu bar"
    show keyboard
    which I have turned on. This enables a menu in the top-right corner of the screen, next to the time and date.
  2. In that menu, I choose "Show Emoji and Symbols" (which I think was previously called "Character Viewer"):

Show Emoji and symbols

  1. Here I can search for the character and under "Font Variation" I can click on each rendering and see the corresponding font:

Font variation

This is usable when there are few enough fonts that contain the character, but is unwieldy (requires a lot of clicking and copying) when the list of fonts is large. What I'd like is to get a list of all such fonts in plain-text copyable form.

How can I do that? I'm willing and happy to write code if necessary.

Best Answer

It's still not clear to me how this is done by macOS itself, but in the meantime here's what I ended up doing.

The solutions I found all were of the following form:

  1. Get a list of all fonts available.
  2. Loop over the list to find fonts that contain the selected character.

Listing all fonts

As at this question, there are two approaches (plus a third one I found here):

  1. system_profiler SPFontsDataType to which you can add -xml to get output in XML,

  2. fc-list which can take a pattern (: is the empty pattern that matches all fonts) and a format specifier.

  3. Instally python-fontconfig, then run import fontconfig; fontconfig.query() to get a list of font paths.

Comparing the two approaches (I wrote this before I had noticed the third one) is interesting:

  • Speed: On my computer and for my set of fonts, fc-list takes about 24 seconds the first time and 0.04 seconds each time after that, while system_profiler consistently takes about 3 seconds each time.

  • Comprehensiveness: On my current system, system_profiler lists 702 fonts while fc-list lists 770: all those 702 plus 68 more. On the one hand, system_profiler seems to be the "official" way, and matches the fonts visible in Font Book, the ones that show up in "Font Variation" in the character/symbol viewer (as in the question), the menu in TextEdit etc. On the other hand, at least some of the fonts that it misses are genuinely usable fonts. This includes not just the 5 fonts /Library/Fonts/{Athelas.ttc,Charter.ttc,Marion.ttc,Seravek.ttc,SuperClarendon.ttc} about which you can find some confusing pages online (e.g. this and this), but also /Library/Fonts/{DIN Alternate Bold.ttf,DIN Condensed Bold.ttf,Iowan Old Style.ttc} and 57 of the 177 Noto Sans fonts I have installed on my system. For example, I have Noto Sans Brahmi installed but this font doesn't show up in Font Book or in "Font Variation" when I search for a Brahmi letter (say ?), but it does get used in TextEdit (and displayed in my browser). Whatever the reason for this weirdness, I'm happy that I can get the full list with fc-list.

  • Ease of use: with either method a little bit of parsing the output is required. With fc-list I can specify the format (e.g. fc-list --format="%{family}\n%{file}\n%{lang}\n\n" but I couldn't find a reference for the names of the fields!); with system_profiler I can either just grep for Location: or output to XML and parse the XML (examples with xml.etree.ElementTree, with plistlib).

Does this font cover this character?

However we get the list of fonts, next we have to check whether a character is covered in a specific font (given by name or path). Again, the ways I found:

  • Use one of the FreeType bindings. For Python, there is freetype-py but I couldn't figure out in a few minutes how to use it.

  • Dump the font's cmap table with ttx/fonttools, then loop over the table. This is certainly doable and I've used such dumping many times (one can just ttx foo.ttf to get the foo.ttx xml file which is even human-readable), but for this use-case (searching over all fonts), it's not the best as it takes seconds per font.

  • Look up the cmap table from a library written for that: use Font::TTF::Font in Perl, from fontTools.ttLib import TTFont in Python -- this would be something like:

    def has_char(font_path, c):
        """Does font at `font_path` contain the character `c`?"""
        from fontTools.ttLib import TTFont
        from fontTools.unicode import Unicode
        try:
            font = TTFont(font_path)
            for table in font['cmap'].tables:
                for char_code, glyph_name in table.cmap.items():
                    if char_code == ord(c):
                        font.close()
                        return True
        except Exception as e:
            print('Error while looking at font %s: %s' % (font_path, e))
            pass
        return False
    

    Unfortunately it fails on far too many fonts to be useful.

  • If you use the python-fontconfig solution, there's a has_char, used like: font = fontconfig.FcFont(path); return font.has_char(c)

Summary

I ended up using the solution from here, which I've lightly rewritten to keep it minimal:

#!/usr/bin/env python

def find_fonts(c):
    """Finds fonts containing  the (Unicode) character c."""
    import fontconfig
    fonts = fontconfig.query()
    for path in sorted(fonts):
        font = fontconfig.FcFont(path)
        if font.has_char(c):
            yield path

if __name__ == '__main__':
    import sys
    search = sys.argv[1]
    char = search.decode('utf-8') if isinstance(search, bytes) else search
    for path in find_fonts(char):
        print(path)

Example usage:

% python3 find_fonts.py 'ಠ'
/Library/Fonts/Arial Unicode.ttf
/Library/Fonts/Kannada MN.ttc
/Library/Fonts/Kannada MN.ttc
/Library/Fonts/Kannada Sangam MN.ttc
/Library/Fonts/Kannada Sangam MN.ttc
/System/Library/Fonts/LastResort.ttf
/Users/shreevatsa/Library/Fonts/Kedage-b.TTF
/Users/shreevatsa/Library/Fonts/Kedage-i.TTF
/Users/shreevatsa/Library/Fonts/Kedage-n.TTF
/Users/shreevatsa/Library/Fonts/Kedage-t.TTF
/Users/shreevatsa/Library/Fonts/NotoSansKannada-Bold.ttf
/Users/shreevatsa/Library/Fonts/NotoSansKannada-Regular.ttf
/Users/shreevatsa/Library/Fonts/NotoSansKannadaUI-Bold.ttf
/Users/shreevatsa/Library/Fonts/NotoSansKannadaUI-Regular.ttf
/Users/shreevatsa/Library/Fonts/NotoSerifKannada-Bold.ttf
/Users/shreevatsa/Library/Fonts/NotoSerifKannada-Regular.ttf
/Users/shreevatsa/Library/Fonts/akshar.ttf

(Works with both python3 and python2, whichever python you have. Takes about 29 seconds on my computer, for the set of fonts I have installed.)