I want to tail -f
a file, but its content is in sjis
encoding, so I need to have it converted to the native (utf-8) encoding of my terminal.
When I do
tail -f x | iconv -fsjis
there will be no output. As
tail x | iconv -fsjis
does work, at first I thought it was a buffering issue, but trying unbuffer
and stdbuf
as described on Turn off buffering in pipe did not help.
In fact, even after more than 10k of data were added to x, there would be no output, so I guess it is not a buffering issue (buffer is 4k, if I'm not mistaken), but iconv will only start outputting when it receives an EOF.
So how can I tail-follow my sjis encoded file?
Best Answer
(take this with a pinch of salt) As far as I remember, the problem lies in the way
libiconv
works. Multi-byte encodings need a state machine to decode them, andlibiconv
prefers to receive entire characters, so you can't just give it half a character in one function call and the other half in the next.I can think of another two solutions, one is a good out-of-band method, the other is an in-band hack.
Change Terminal Emulator encoding (out-of-band): one is to change the character encoding in your terminal emulator, so its native encoding is Shift JIS. I just checked
konsole
, and is supports this. From the menu, View→Character encoding→Japenese→sjis. You can then justtail -f
the file, andkonsole
will take care of decoding the multibyte characters and matching them up to font glyphs.Transcode terminal encoding on the fly (in-band; best): courtesy of Gilles, who reminded me of
luit
after a very long time. Useluit
, which should have come with your XOrg distribution (on Debian, it's packagex11-utils
). Use it like this:This will make the terminal transcode SJIS to/from your terminal encoding, and run
tail -f x
. The downside ofluit
is that it doesn't support the wealth of encodings supported bylibiconv
. The upside is it's available almost everywhere.Transcode terminal encoding on the fly (in-band; hack):
ttyconv
is a hack I wrote many years ago (initially in C, later redone in Python) which useslibiconv
to transcode terminal I/O. It spawns a new pseudoterminal and (a) transcodes the characters you type from your local encoding into the remote encoding, and (b) transcodes the characters you receive from the remote encoding to your local encoding. I used it to talk to servers that used encodings not supported by the standard Linux terminals. Please note that all of the remote encodings I tested it with were single-byte encodings, so I can't guarantee it'd work for Shift JIS. I don't often find call to use it these days, with most systems switching to Unicode.This is how you would use it:
The downside of
ttyconv
is that I wrote it, no-one uses it but me, it's probably full of bugs. I excel at this. The upside is that it useslibiconv
, so if your encoding is unusual, it's your best bet. At last count,ttyconv --list
supports 100 encodings.