I've written a Norwegian markdown document:
$ file brukerveiledning.md
brukerveiledning.md: UTF-8 Unicode text
I've converted it to HTML using the markdown
command:
$ markdown > brukerveiledning.html < brukerveiledning.md
$ file brukerveiledning.html
brukerveiledning.html: UTF-8 Unicode text
However, Firefox insists on using the "windows-1252" encoding, breaking the non-ASCII characters. I've tried setting the changing the fallback text encoding from "Default for Current Locale" (which here in the UK should be either ISO-8859-1 or UTF-8) to "Central European, ISO", "Central European, Microsoft", and "Other (incl. Western European)". None of these can display æ, ø and å. There are no Unicode options. I've also tried changing intl.fallbackCharsetList.ISO-8859-1
in about:config to various values like utf8
, utf-8
, iso-8859-1
, with no luck.
Using this markdown
package:
$ pacman --query --owns "$(which markdown)"
/usr/bin/markdown is owned by markdown 1.0.1-6
and this locale:
$ locale
LANG=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=
I tried to ask for a solution at the markdown
command level, but that was rejected.
Best Answer
Update: this has been fixed since Firefox 66
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/66#HTML
Historical information from 2016
The reasoning behind this behavior seems to be described in Mozilla bugs 815551 (Autodetect UTF-8 by default) and 1071816 (Support loading BOMless UTF-8 text/plain files from file: URLs)
As far as I understand it basically boils down to "one should always specify the encoding as detection is too unreliable".
charset
in theContent-Type
Header<meta charset="utf-8" />
Mozilla devs seem to be open for a patch that adds a preference setting, so one day it might be possible to open local BOM-less UTF-8 documents in Firefox.