Ubuntu – Detect missing glyphs in text

fontspython

I have written a Python3 appindicator which calls fortune and captures the output for display in the on-screen notification.

Some fortunes contain squares with a hexadecimal number when the corresponding glyph does not exist in the current font. Each square is the representation of the hexadecimal Unicode code point for the missing glyph.

I want to remove the hexadecimal text before I display to the user. I was hoping to find some Python API which would let me inspect text, character by character, to determine something like char.isValidCodePoint() or similar but I cannot find as such.

I found a possible solution that I wanted to investigate here but after installing fonttools via the terminal, my Python program could not import fonttools/fontTools.

Any ideas – either using the Python API or calling out to a terminal?

Update #1: I have since realised the fonttools sample code from the link above will not work for me as it is Python2. I suppose if fonttools could somehow be used, I could invoke a Python2 interpreter from my Python3 script.

Update #2: After lots of reading (see references below), I have since found fc-match but it cannot always uniquely identify the font in use. I obtain the current font in Python:

from gi.repository import Gio
fontName = Gio.Settings( "org.gnome.desktop.interface" ).get_string( "font-name" )

resulting in Ubuntu 11. Passing this result to pango-view along with the hexadecimal character, I get a list of fonts including Ubuntu. To my thinking if the glyph was NOT rendered by the font, the font should not appear in the result from pango-view!

References:

Best Answer

This is a different approach from where you were going with this, but perhaps you could just use python's str.replace() or re.sub() methods to parse out the hexidecimal strings from your text body. i.e.:

If the hex is predictable:

originalText = "\xc3\xa5Test"
filteredText = originalText.replace("\xc3\xa5", "")

Or if you need to match any hex chars with a regular expression:

import re

originalText = "\xc3\xa5Test"
filteredText = re.sub(r'[^\x00-\x7f]', r'', originalText)

More good discussion of this strategy

Related Question