I have reformulated your questions a bit, for reasons that should
appear evident when you read them in sequence.
1. Is it possible to config linux filesystem use fixed character encoding to store file names regardless of LANG/LC_ALL environment?
No, this is not possible: as you mention in your question, a UNIX file
name is just a sequence of bytes; the kernel knows nothing about
the encoding, which entirely a user-space (i.e., application-level)
concept.
In other words, the kernel knows nothing about LANG
/LC_*
, so it cannot
translate.
2. Is it possible to let different file names refer to same file?
You can have multiple directory entries referring to the same file;
you can make that through hard links or symbolic links.
Be aware, however, that the file names that are not valid in the
current encoding (e.g., your GBK character string when you're working
in a UTF-8 locale) will display badly, if at all.
3. Is it possible to patch the kernel to translate character encoding between file-system and current environment?
You cannot patch the kernel to do this (see 1.), but you could -in
theory- patch the C library (e.g., glibc) to perform this translation,
and always convert file names to UTF-8 when it calls the kernel, and
convert them back to the current encoding when it reads a file name
from the kernel.
A simpler approach could be to write an overlay filesystem with FUSE,
that just redirects any filesystem request to another location after
converting the file name to/from UTF-8. Ideally you could mount this
filesystem in ~/trans
, and when an access is made to
~/trans/a/GBK/encoded/path
then the FUSE filesystem really accesses
/a/UTF-8/encoded/path
.
However, the problem with these approaches is: what do you do with
files that already exist on your filesystem and are not UTF-8 encoded?
You cannot just simply pass them untranslated, because then you don't
know how to convert them; you cannot mangle them by translating
invalid character sequences to ?
because that could create
conflicts...
opendir
and readdir
themselves work on bytes. They do not perform and reencoding.
Some filesystem drivers may impose contraints on the byte sequences. For example, HFS+ normalizes file names using a proprietary Unicode normalization scheme. I would expect the form returned by readdir
to work when passed to opendir
, however, so like the OP in the Ubuntu forum thread that jw013 mentioned, I suspect a bug in the HFS+ driver. It is not the only program that is tripped by Hangul on HFS+. Even OSX seems to have trouble with Unicode normalization.
Best Answer
1- you can use
convmv -f xxx -t yyy --notest .
in your folder contains those already extracted files/folders. xxx means your Windows encoding, gbk and so on; yyy is your Linux encoding, utf8 and so on.3- you can use file-roller(Ark is not supported), uninstall unzip package and install p7zip-full packages.