Convert a Windows-created ZIP to Linux (internal paths issue)

filenameszip

I have a .zip created on a Windows machine (outside of my control). The zip file contains paths that I need to preserve when I unzip.

However, when I unzip, all files end up like:
unzip_dir/\window\path\separator\myfile.ext

I've tried both, with and without -j option.
My issue is that I need that path information under \window\path\separator\. I need that file structure to be created when I unzip.

I can mv the file and flip the \ to / easily enough in a script, but then there are errors that the destination path directories do not exist. My workaround for now is to mkdir -p the paths (after converting \ to /) and then cp the files to those paths.

But there are a lot of files, and these redundant mkdir -p statements for every file really slows things down.

Is there any more elegant way to convert a zip file with Windows paths to Linux paths?

Best Answer

I think something went wrong with the creation of the zip file, because when I create a zip file on Windows is has (portable) forward slashes:

zip.exe -r pip pip
updating: pip/ (244 bytes security) (stored 0%)
  adding: pip/pip.log (164 bytes security) (deflated 66%)

But now that you have the files with file names that contain "paths" with backslashes, you can run the following program in unzip_dir:

#! /usr/bin/env python

# already created directories, walk works topdown, so a child dir
# never creates a directory if there is a parent dir with a file.
made_dirs = set()

for root, dir_names, file_names in os.walk('.'):
    for file_name in file_names:
        if '\\' not in file_name:
            continue
        alt_file_name = file_name.replace('\\', '/')
        if alt_file_name.startswith('/'):
            alt_file_name = alt_file_name[1:]  # cut of starting dir separator
        alt_dir_name, alt_base_name = alt_file_name.rsplit('/', 1)
        print 'alt_dir', alt_dir_name
        full_dir_name = os.path.join(root, alt_dir_name)
        if full_dir_name not in made_dirs:
            os.makedirs(full_dir_name)  # only create if not done yet
            made_dirs.add(full_dir_name)
        os.rename(os.path.join(root, file_name),
                  os.path.join(root, alt_file_name))

This handles files in any directory under the directory from where the program is started. Given the problem that you describe, the unzip_dir probably doesn't have any subdirectories to start with, and the program could just walk over the files in the current directory only.

Related Question