“tail -f | iconv -fsjis” does not output anything

buffertail

I want to tail -f a file, but its content is in sjis encoding, so I need to have it converted to the native (utf-8) encoding of my terminal.

When I do

tail -f x | iconv -fsjis

there will be no output. As

tail x | iconv -fsjis

does work, at first I thought it was a buffering issue, but trying unbuffer and stdbuf as described on Turn off buffering in pipe did not help.

In fact, even after more than 10k of data were added to x, there would be no output, so I guess it is not a buffering issue (buffer is 4k, if I'm not mistaken), but iconv will only start outputting when it receives an EOF.

So how can I tail-follow my sjis encoded file?

Best Answer

(take this with a pinch of salt) As far as I remember, the problem lies in the way libiconv works. Multi-byte encodings need a state machine to decode them, and libiconv prefers to receive entire characters, so you can't just give it half a character in one function call and the other half in the next.

I can think of another two solutions, one is a good out-of-band method, the other is an in-band hack.

Change Terminal Emulator encoding (out-of-band): one is to change the character encoding in your terminal emulator, so its native encoding is Shift JIS. I just checked konsole, and is supports this. From the menu, View→Character encoding→Japenese→sjis. You can then just tail -f the file, and konsole will take care of decoding the multibyte characters and matching them up to font glyphs.

Transcode terminal encoding on the fly (in-band; best): courtesy of Gilles, who reminded me of luit after a very long time. Use luit, which should have come with your XOrg distribution (on Debian, it's package x11-utils). Use it like this:

$ luit -encoding SJIS -- tail -f x

This will make the terminal transcode SJIS to/from your terminal encoding, and run tail -f x. The downside of luit is that it doesn't support the wealth of encodings supported by libiconv. The upside is it's available almost everywhere.

Transcode terminal encoding on the fly (in-band; hack): ttyconv is a hack I wrote many years ago (initially in C, later redone in Python) which uses libiconv to transcode terminal I/O. It spawns a new pseudoterminal and (a) transcodes the characters you type from your local encoding into the remote encoding, and (b) transcodes the characters you receive from the remote encoding to your local encoding. I used it to talk to servers that used encodings not supported by the standard Linux terminals. Please note that all of the remote encodings I tested it with were single-byte encodings, so I can't guarantee it'd work for Shift JIS. I don't often find call to use it these days, with most systems switching to Unicode.

This is how you would use it:

$ ttyconv -rsjis -- tail -f x

The downside of ttyconv is that I wrote it, no-one uses it but me, it's probably full of bugs. I excel at this. The upside is that it uses libiconv, so if your encoding is unusual, it's your best bet. At last count, ttyconv --list supports 100 encodings.