I am using Ubuntu14.04 to connect to a remote host.
Which its version is:
Linux version 2.6.32-431.11.5.el6.yyyzzz.x86_64 (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Thu Jul 3 09:42:34 CST 2014
My upload file on that machine won't display Chinese characters correctly.
And I open a file, type randomly Chinese Character with Ubuntu ibus input method
. And it shows:
~R~V�~K~B~I~W个~I~N~T�饭~T~E
I searched online and tried the following 2 methods:
1: examine the locale
It shows:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
Seems no problem.
2: install Chinese Language support package
I did:
yum install "@Chinese Support"
It installed 178M files on that machine.
After that, I open another file, and try typing some Chinese with ibus. But the problem remains, how to solve it?
update1
I did some more research after. I find that some characters can be typed out correctly(via Pinyin input method, ibus). like:
起 度 顿 客
They are all corresponding to their Pinyin. But there is a automate-generated space after each character( not typed by me).
If I try to type 启,杜,盾,刻 (they have the same Pinyin as the above 4 Chinese characters). I got:
�~P�~]~\ ~[� ~H�
For my experience, if the code converting is totally messed up. When I type a Pinyin, I shall get some wired characters which look like Chinese but actually were not, and they will never correspond to that Pinyin I typed.
This time, the things are little bit different.I can type some characters correctly(with an system-generated space), and others are indecipherable.
Best Answer
Basically, this may be a problem of mismatch between your locale, which is set to
UTF-8
, and the encoding of your Chineses character file, which may be encoded ingbk
,gb2312
,gb18030
, orBig-5
.All those encoding listed above are incompatible with
UTF-8
.Now, let's assume
gbk
is the encoding of your file. So when you try to show the contents of the file, agbk
encoded file is interpreted as aUTF-8
file, which causes the gibberish.Here comes the solution.
Use
luit
. (Preferred)luit -encoding gbk cat a_chinese_file.txt
Since most (if not every) encoding in use is compatible with
ASCII
, and if you only need characters inASCII
and another encoding, you can use the following two methods.Change the encoding of your terminal
You may considered it since this method does not require additional package to be installed.
Change Your locale
But I think this requires you to install the corresponding locale.
Some details about the Chinese encoding mentioned above.
gbk
,gb2312
,gb18030
are encodings for Simplified Chinese.If you are not sure which certain encoding your file is using, assume it
gb18030
.Number of characters contained in each encoding follows this:
gb18030
>gbk
>gb2312
. And the superior encoding is a superset of what follows.Big-5
is the encoding for Traditional Chinese.What's more, encoding for Simplified Chinese is sometimes refered as
CP936
(Code Page 936, I think this name comes from Windows).