How to display Chinese characters correctly on remote Red-Hat machine

character encodinginput-methodunicode

I am using Ubuntu14.04 to connect to a remote host.

Which its version is:

Linux version 2.6.32-431.11.5.el6.yyyzzz.x86_64 (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Thu Jul 3 09:42:34 CST 2014

My upload file on that machine won't display Chinese characters correctly.
And I open a file, type randomly Chinese Character with Ubuntu ibus input method. And it shows:

~R~V�~K~B~I~W个~I~N~T�饭~T~E

I searched online and tried the following 2 methods:

1: examine the locale

It shows:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Seems no problem.

2: install Chinese Language support package

I did:

yum install "@Chinese Support"

It installed 178M files on that machine.

After that, I open another file, and try typing some Chinese with ibus. But the problem remains, how to solve it?


update1
I did some more research after. I find that some characters can be typed out correctly(via Pinyin input method, ibus). like:

起 度 顿 客

They are all corresponding to their Pinyin. But there is a automate-generated space after each character( not typed by me).

If I try to type 启,杜,盾,刻 (they have the same Pinyin as the above 4 Chinese characters). I got:

�~P�~]~\ ~[� ~H�

For my experience, if the code converting is totally messed up. When I type a Pinyin, I shall get some wired characters which look like Chinese but actually were not, and they will never correspond to that Pinyin I typed.

This time, the things are little bit different.I can type some characters correctly(with an system-generated space), and others are indecipherable.

Best Answer

Basically, this may be a problem of mismatch between your locale, which is set to UTF-8, and the encoding of your Chineses character file, which may be encoded in gbk, gb2312, gb18030, or Big-5.

All those encoding listed above are incompatible with UTF-8.

Now, let's assume gbk is the encoding of your file. So when you try to show the contents of the file, a gbk encoded file is interpreted as a UTF-8 file, which causes the gibberish.

Here comes the solution.

  • Use luit. (Preferred)

    $ whatis luit
    luit (1)             - Locale and ISO 2022 support for Unicode terminals
    

    luit -encoding gbk cat a_chinese_file.txt

Since most (if not every) encoding in use is compatible with ASCII, and if you only need characters in ASCII and another encoding, you can use the following two methods.

  • Change the encoding of your terminal

    You may considered it since this method does not require additional package to be installed.

  • Change Your locale

    But I think this requires you to install the corresponding locale.


Some details about the Chinese encoding mentioned above.

  • gbk, gb2312, gb18030 are encodings for Simplified Chinese.

    If you are not sure which certain encoding your file is using, assume it gb18030.

    Number of characters contained in each encoding follows this: gb18030 > gbk > gb2312. And the superior encoding is a superset of what follows.

  • Big-5 is the encoding for Traditional Chinese.

What's more, encoding for Simplified Chinese is sometimes refered as CP936 (Code Page 936, I think this name comes from Windows).

Related Question