Linux – How to set fallback encoding to UTF-8 in Firefox

arch linuxcharacter encodingfirefox

I've written a Norwegian markdown document:

$ file brukerveiledning.md
brukerveiledning.md: UTF-8 Unicode text

I've converted it to HTML using the markdown command:

$ markdown > brukerveiledning.html <  brukerveiledning.md 
$ file brukerveiledning.html 
brukerveiledning.html: UTF-8 Unicode text

However, Firefox insists on using the "windows-1252" encoding, breaking the non-ASCII characters. I've tried setting the changing the fallback text encoding from "Default for Current Locale" (which here in the UK should be either ISO-8859-1 or UTF-8) to "Central European, ISO", "Central European, Microsoft", and "Other (incl. Western European)". None of these can display æ, ø and å. There are no Unicode options. I've also tried changing intl.fallbackCharsetList.ISO-8859-1 in about:config to various values like utf8, utf-8, iso-8859-1, with no luck.

Using this markdown package:

$ pacman --query --owns "$(which markdown)"
/usr/bin/markdown is owned by markdown 1.0.1-6

and this locale:

$ locale 
LANG=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=

I tried to ask for a solution at the markdown command level, but that was rejected.

Best Answer

Update: this has been fixed since Firefox 66

UTF-8-encoded HTML (and plain text) files loaded from file: URLs are now supported without <meta charset="utf-8"> or the UTF-8 BOM

https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/66#HTML


Historical information from 2016

The reasoning behind this behavior seems to be described in Mozilla bugs 815551 (Autodetect UTF-8 by default) and 1071816 (Support loading BOMless UTF-8 text/plain files from file: URLs)

As far as I understand it basically boils down to "one should always specify the encoding as detection is too unreliable".

  • For non-local content you should leverage the protocol. With HTTP this would be providing the correct charset in the Content-Type Header
  • For HTML content you may additionally use the Doctype, i.e. <meta charset="utf-8" />
  • And for anything else the only standard way left ist to specify a BOM...

Mozilla devs seem to be open for a patch that adds a preference setting, so one day it might be possible to open local BOM-less UTF-8 documents in Firefox.

Related Question