Linux – Use `less` pager on file with non-standard encoding

character encodinglesslinuxpager

I often use the less pager to view logfiles. Usually I use less -F to follow the progress of the log à la tail.

However, some logfiles use national characters in a non-standard encoding (Latin-1, while the system uses UTF-8). Obviously, these will not be displayed correctly.

How can I view such files with less?

The only solutions I found:

  • Correct the encoding of the file (recode or iconv). This does not work while the file is still being written, so does not let me use less -F. Plus it destroys the logfiles original timestamp, which is bad from an auditing perspective.
  • Use a pipe (recode latin1... |less). Works for files in progress, but unfortunately then less -F does not appear to work (it just does not update; I believe the recode process exits once it's done).

Any solution that lets me "tail" a logfile and still shows national characters correctly?

Best Answer

Hm, apparently less cannot do this. The part in less' sourcecode that implements the "following" seems to be:

A_F_FOREVER:
                        /*
                         * Forward forever, ignoring EOF.
                         */
                        if (ch_getflags() & CH_HELPFILE)
                                break;
                        cmd_exec();
                        jump_forw();
                        ignore_eoi = 1;
                        while (!sigs)
                        {
                                make_display();
                                forward(1, 0, 0);
                        }
                        ignore_eoi = 0;

As far as my (limited) knowledge of C goes, this means that if "follow" is activated, less will:

  1. seek to the end of input
  2. read and update the display in a loop, until Ctrl-C is pressed

If input is a pipel, 1. will not return until the pipe signals EOF. If I use tail -f xx|less, the pipe will never signal EOF, so less hangs :-(.

I did however find a way to get what I want:

 tail -f inputfile | recode latin1.. > /tmp/tmpfile

then

less +F /tmp/tmpfile

This will work, because it lets less +F work on a real file. It's still somewhat awkward, because recode apparently only processes data in blocks of 4096 bytes, but it works...

Related Question