Why non-ASCII characters are displayed using a question mark

i18nlocaleunicodeyocto

I'm working on an embedded linux distribution based on Yocto Morty.

I have used an Ubuntu distribution to create the following two files:

  • fòò.dàt
  • bàr.dàt

I have stored the files into a pendrive and connected the pendrive to my embedded system.

I have used PuTTY to connect via serial to the embedded system and browse the content of the pendrive. The files are listed as follow:

root@imx6qsabresd:/media/linux_desktop# ls -la
total 8
drwxr-xr-x 2 root root 4096 Mar 17  2017 .
drwxr-xr-x 9 root root 4096 Jan  1  1970 ..
-rwxr-xr-x 1 root root    0 Mar 17  2017 b?r.d?t
-rwxr-xr-x 1 root root    0 Mar 17  2017 f??.d?t

The locale of the Ubuntu distribution is:

user@user-VirtualBox:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=it_IT.UTF-8
LC_TIME=it_IT.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=it_IT.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=it_IT.UTF-8
LC_NAME=it_IT.UTF-8
LC_ADDRESS=it_IT.UTF-8
LC_TELEPHONE=it_IT.UTF-8
LC_MEASUREMENT=it_IT.UTF-8
LC_IDENTIFICATION=it_IT.UTF-8
LC_ALL=

The locale of the embedded distribution is:

root@imx6qsabresd:/media/linux_desktop# locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=en_US

even if the .UTF-8 suffix isn't explicited I assume the embedded system locale is UTF-8 because:

root@imx6qsabresd:/media/linux_desktop# locale charmap
UTF-8

see here: https://stackoverflow.com/a/42797421/5321161 for further details.

Below the list of locales I've currently installed in my embedded distribution:

root@imx6qsabresd:/media/linux_desktop# locale -a
C
de_DE
en_GB
en_GB.ISO-8859-1
en_US
en_US.ISO-8859-1
fr_FR
POSIX
zh_CN

PuTTY terminal emulator is configured to use UTF-8 as remote character set.

Why accented characters are replaced by question marks?

Best Answer

The problem was caused by the mount of the pendrive. I usually mount the device without specifying any option. E.g.

mount /dev/sdb1 /media

The result is:

/dev/sdb1 on /media type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

As described here: https://linux.die.net/man/8/mount the default iocharset option is: iso8859-1.

I tried to mount the pendrive specifying the option iocharset=utf8 and this solved the problem:

mount -o iocharset=utf8 /dev/sdb1 /media

See the following UTF-8 characters correctly displayed in console:

root@imx6qsabresd:/media/win/mix# ls -la
total 28
drwxr-xr-x 7 root root 4096 Mar 13 15:19 .
drwxr-xr-x 9 root root 4096 Mar 16  2017 ..
drwxr-xr-x 2 root root 4096 Mar 13 15:13 Île-de-France
-rwxr-xr-x 1 root root    0 Mar 13 15:13 Île-de-France.txt
drwxr-xr-x 2 root root 4096 Mar 13 15:14 madrileños
-rwxr-xr-x 1 root root    0 Mar 13 15:15 madrileños.txt
drwxr-xr-x 2 root root 4096 Mar 13 14:58 mà_però
-rwxr-xr-x 1 root root    0 Mar 13 14:57 mà_però.txt
drwxr-xr-x 2 root root 4096 Mar 13 15:12 Märkisch-Oderland
-rwxr-xr-x 1 root root    0 Mar 13 15:13 Märkisch-Oderland.txt
drwxr-xr-x 2 root root 4096 Mar 13 15:08 أبو ظبي
-rwxr-xr-x 1 root root    0 Mar 13 15:09 أبو ظبي.txt
Related Question