I got ZIP file(s), which contains files, which filenames are in some encoding. Let's say I know encoding of those filenames, but I still dont know how to properly decompress them.
Here is example file, it contains one file
"【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12.ass"
I know used encoding is GB18030 (Chinese)
Question is – how to unpack that file in FreeBSD using unzip or other CLI utility to get proper encoded filename? I tried everything what I could, but result was never good. Please help.
I tried on OSX:
MBP1:test 2ge$ bsdtar xf gb18030.zip
MBP1:test 2ge$ ls
%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12/ gb18030.zip
MBP1:test 2ge$ cd %A1%BESSK%D7%D6Ļ%D7顿The\ Vampire\ Diaries\ %CE%FCѪ%B9%ED%C8ռ%C7S06E12/
MBP1:%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12 2ge$ ls
%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12.ass*
MBP1:%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12 2ge$ find . | iconv -f gb18030 -t utf-8
.
./%A1%BESSK%D7%D6L抬%D7椤縏he Vampire Diaries %CE%FC血%B9%ED%C8占%C7S06E12.ass
MBP1:%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12 2ge$ convmv -r -f gb18030 -t utf-8 --notest .
Skipping, already UTF-8: ./%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12.ass
Ready!
I tried similar with unzip, but I get similar problem.
Thanks, now trying on FREE BSD, where I am connecting using SSH from OSX (Terminal):
# locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=C
The first thing, I would like to is to proper show Chinese names. I changed
setenv LC_ALL zh_CN.GB18030
setenv LANG zh_CN.GB18030
Then I downloaded file and try to "ls" to see proper characters, but not luck. So I think I have to solve first Chinese locale to verify when I get proper result, actually I can compare it. Can you also help me please with this?
Best Answer
Here's what I do on Ubuntu 16.04 to unzip a zip in any encoding, as long as I know what that encoding is. The same method should work on FreeBSD because it only relies on widely available
unzip
tool.I double-check the exact name of the encoding, as to not misspell it: https://www.iana.org/assignments/character-sets/character-sets.xhtml
I simply run
or
choosing between
-O
or-I
according to instructions here:which means that I simply try
-O
and it should work, because not a lot of people would create a.zip
file in Unix...So, for your specific example:
The exact encoding name is
GB18030
.I use the
-O
flag and:... it works.