I need to make a zip file archiving ~100k files from a directory containing ~500k files. I get "argument list too long" errors when I try the obvious commands:
zip archive.zip *pattern*.txt # fails
zip archive.zip `find . -name "*pattern*.txt"` # fails
One approach is to use the -@
option to feed a list of files in via stdin:
find . -name "*pattern*.txt" | zip -@ archive.zip
However, the zip
man page says:
If a file list is specified as -@ [Not on MacOS], zip takes the list of input files from standard input instead of from the command line.
It's the "Not on MacOS" that is bugging me. I went ahead and tried the -@
option, and it seems to work; but I'm feeling nervous about whether it's really doing the right job (archiving all the files, intact).
Here are my questions:
- Why would
-@
not be OK on MacOS? - Are there some versions of MacOS/bash/zip where this warning is true, and others where it is not? Is this an obsolete warning, and if so, where is the dividing line?
- What would be a viable approach for this problem without using
-@
?
Note that the solution given here zip: Argument list too long (80.000 files in overall) will not work; I need to be archiving some, not all, of the files in the directory.
I'm running Mac OS 10.7.5. Here is some version info:
$ bash --version
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin11)
$ zip --version
This is Zip 3.0 (July 5th 2008), by Info-ZIP.
...
Compiled with gcc 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00) for Unix (Mac OS X) on Jun 24 2011.
Best Answer
First of all,
is never a good idea. Filenames can contain spaces, newlines character, parts that could interpreted as switches and whatnot.
To perform an action for every found file, you can use the
-exec
switch or xargs.will add the files one by one to the zip file. Here,
{}
symbolizes the currently processed file.Terminating the
-exec
argument with a+
instead of;
causes find to process several file at once (as many as it can without generating the same errors you're getting), which should be considerably faster for a large number of files.does essentially the same. xargs processes several files at once by default.
The
-print0
switch to find and-0
switch to xargs make them use null characters as file separators to deal properly with strange filenames.I don't know why the
-@
isn't recommended for Mac OS1, butfind ... | zip -@
will not handle strange filenames (specifically, filenames containing newline characters) properly. This is true regardless of the operating system.1 I'm guessing this applies only to Mac OS up to version 9.x, since Mac OS used carriage returns as newline characters, while
zip -@
expects linefeeds.