Linux – Extract files with umlaut in 7zip file created under windows to Linux

7-zipdebian-wheezyencodinglinuxwindows

I want to extract a large backup of my hard drive compressed with 7zip under Windows to my Debian Wheezy installation. I'm using the following command line:

7z x -pmypasswordhere file.7z

If there's now a file or a folder called Äpfel (German for apples) the result on the Linux hard drive is äpfel.

How can I solve this issue? I tried using the following, but this says that the command line is invalid:

7z x -scsWIN -pmypasswordhere file.7z

…where the -scs switch is explained as: "-scs{UTF-8 | WIN | DOS}: set charset for list files".

I've compressed the file on Window 8 on a NTFS partition with 7z 9.30 64bit. The options were compression strength is Ultra. I've encrypted file names and their contents with AES-265. My Debian Wheezy installation is german, so echo $LANG is "de_DE.UTF-8".

Best Answer

For "äpfel" to become "äpfel", it would be necessary to get äpfel{UTF-8} and convert it using ISO-8859-15 to UTF8. Then you would get äpfel{UTF-8}.

So how can this happen? (There appears to be no ISO-8859-1[5] (Latin1) in your workflow).

I believe I could reproduce this on a VFAT or NTFS partition using the mount iocharset=value option. If I set it to ISO-8859-15 and had a locale of UTF-8, then maybe the system could be tricked into converting filenames "in the wrong direction".

But here, your Wheezy installation is most likely ext3, and I'm not aware of a NLS option for ext3.

Another possibility is that the files are actually correctly created, and you're just seeing them wrong:

  • is Putty set to use UTF8?
  • are your FTP server (and client) set to UTF8?

I notice another strange thing: your two apple files, the one at 16:10 and the one at 16:34, appear to be displayed by ls using two different date formats. In one case, the year is specified.

It might be that 7z is creating a slightly unusual inode entry?

However, here is a trick using convmv utility that might be of help.

Related Question