I have several ZIP and RAR archives where the filenames inside the archive are scrambled and they contain invalid filesystem characters like ?-s *-s or !-s or very long file and directory names that confuse the usual archiver tools or they simply fail to create the file. Since only the content matters I would just like to extract the files in these archives into a single directory in a flat structure with generic names like file0, file1, file2, etc… What's the simplest way to do that?
How to decompress files from an archive ignoring the file names
archivefilenames
Related Solutions
You can pass the -S
option to use a suffix other than .gz
.
gunzip -S .compressed file.compressed
If you want the uncompressed file to have some other name, run
gzip -dc <compressed-file >uncompressed-file
gunzip <compressed-file >uncompressed-file
(these commands are equivalent).
Normally unzipping restores the name and date of the original file (when it was compressed); this doesn't happen with -c
.
If you want the compressed file and the uncompressed file to have the same name, you can't do it directly, you need to either rename the compressed file or rename the uncompressed file. In particular, gzip
removes and recreates its target file, so if you need to modify the file in place because you don't have write permission in the directory, you need to use -c
or redirection.
cp somefile /tmp
gunzip </tmp/somefile >|somefile
Note that gunzip <somefile >somefile
will not work, because the gunzip
process would see a file truncated to 0 bytes when it starts reading. If you could invoke the truncation, then gunzip
would feed back on its own output; either way, this one can't be done in place.
When searching for a single file in a large archive, it uses method 1, which you can see using strace
:
open("dataset.zip", O_RDONLY) = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive: dataset.zip\n", 22Archive: dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "\340P\356(s\342\306\205\201\27\360U[\250/2\207\346<\252+u\234\225\1[<\2310E\342\274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET) = 943722880
read(3, "\3\f\225P\\ux\v\0\1\4\350\3\0\0\4\350\3\0\0", 20) = 20
lseek(3, 943718400, SEEK_SET) = 943718400
read(3, "\340P\356(s\342\306\205\201\27\360U[\250/2\207\346<\252+u\234\225\1[<\2310E\342\274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET) = 849346560
read(3, "D\262nv\210\343\240C\24\227\344\367q\300\223\231\306\330\275\266\213\276M\7I'&35\2\234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550) = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt "..., 37 extracting: rand-28.txt ) = 37
read(3, "\275\3279Y\206\223\217}\355W%:\220YNT\0\257\260z^\361T\242\2\370\21\336\372+\306\310"..., 8192) = 8192
unzip
opens dataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.
The central directory is found by scanning the last 65557 bytes of the archive; see the code starting here:
/*---------------------------------------------------------------------------
Find and process the end-of-central-directory header. UnZip need only
check last 65557 bytes of zipfile: comment may be up to 65535, end-of-
central-directory record is 18 bytes, and signature itself is 4 bytes;
add some to allow for appended garbage. Since ZipInfo is often used as
a debugging tool, search the whole zipfile if zipinfo_mode is true.
---------------------------------------------------------------------------*/
Best Answer
There is a Perl script written by Daniel S. Sterling available at https://gist.github.com/eqhmcow/5389877 (referenced from IO::Uncompress::Unzip) that looks like it could almost do what you need.
my $status, $filenumber = 0;
#
at the beginning of each line)my $destfile = "file" . $filenumber++;
The entire script, with these modifications, is presented below for reference: