UTF-8 – CLI Program ‘less’ Interprets UTF-8 Text File as Binary

lessutf-8

I have a file that has non-ascii, UTF-8 characters in it. When I use less to view that file I get a warning saying may be a binary file. See it anyway? But the file is clearly not a binary. And when I do open the file, the characters are not correctly rendered. What makes less believe the file is binary? Also, please note that the files have many more lines of plain ASCII text that I've cut out for brevity. This is a semi-minimal example that reproduces the behavior.

More context:

$ cat broken.log
⋮
⋮ =✓)
$ head broken.log
⋮
⋮ =✓)
$ less broken.log
"broken.log" may be a binary file.  See it anyway?

<E2><8B><AE>
<E2><8B><AE> =<E2><9C><93>)
broken.log (END)

$ file broken.log
broken.log: UTF-8 Unicode text

OS:

$ cat /etc/os-release  
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

LESS: I'm pretty sure it's version 487-0.1.

ENV:

$ env | grep LANG
LANG=en_US.UTF-8
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ which less
/usr/bin/less
$ ls -la $(which less)
lrwxrwxrwx 1 root root 9 Jul 20 15:49 /usr/bin/less -> /bin/less
$ ls -la /bin/less
-rwxr-xr-x 1 root root 166664 May  7  2018 /bin/less
$ type -a less
less is /usr/bin/less
less is /bin/less

Best Answer

Solved here... https://stackoverflow.com/questions/43708896/unable-to-locate-package-language-pack-en

... with ...

RUN apt-get install -y locales locales-all
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
Related Question