Sorry for asking a question similar to my previous one. The difference from the last question is that now it is in a zip archive where Chinese encoding in names of compressed files are not recognized, both after extraction and after listing the content of the zip archive:
$ unzip -l "严蔚敏数据结构(c语言版)教材及答案.zip"
Archive: 严蔚敏数据结构(c语言版)教材及答案.zip
Length Date Time Name
--------- ---------- ----- ----
25600 2000-01-04 23:27 ?+?+i- ??-?.doc
80896 2000-01-04 23:27 ?+??i- -+.doc
41984 2000-01-04 23:27 ?++?i- i+????-?.doc
52224 2000-01-04 23:27 ?+?+i- ??i?.doc
50688 2000-01-04 23:27 ?+??i- ??????.doc
54272 2000-01-04 23:27 ?++?i- -????-??????.doc
26112 2000-01-04 23:27 ?+?-i- ?????????_+?.doc
76288 2000-01-04 23:27 ?+-?i- -??-????-?.doc
53760 2000-01-04 23:27 ?+-?i- -+?+++?=.doc
53760 2000-01-04 23:27 ?+--i- ??.doc
7929077 2009-02-26 22:49 -???????+C????+??+?+?+pdf.pdf
--------- -------
8444661 11 files
I was wondering how to deal with this problem?
Thanks and regards!
update:
I have uploaded this zip archive to and it can be downloaded from http://www.mediafire.com/?dw87ee72m56evy9
I tried to use chardet to determine the encoding of the names of the compressed files by:
$ unzip -l "严蔚敏数据结构(c语言版)教材及答案.zip" | chardet
<stdin>: utf-8 (confidence: 0.99)
But are the file names indeed encoded in utf-8? Aren't they supposed to be in a foreign encoding? I guess the output by unzip -l
are too much, and how shall I only single out the filenames in its output as input to chardet?
Best Answer
Try: