Macos – osx: selectively zip large number of files: -@ option OK

bashcommand linemacoszip

I need to make a zip file archiving ~100k files from a directory containing ~500k files. I get "argument list too long" errors when I try the obvious commands:

zip *pattern*.txt                        # fails
zip `find . -name "*pattern*.txt"`       # fails

One approach is to use the -@ option to feed a list of files in via stdin:

find . -name "*pattern*.txt" | zip -@

However, the zip man page says:

If a file list is specified as -@ [Not on MacOS], zip takes the list of input files from standard input instead of from the command line.

It's the "Not on MacOS" that is bugging me. I went ahead and tried the -@ option, and it seems to work; but I'm feeling nervous about whether it's really doing the right job (archiving all the files, intact).

Here are my questions:

  1. Why would -@ not be OK on MacOS?
  2. Are there some versions of MacOS/bash/zip where this warning is true, and others where it is not? Is this an obsolete warning, and if so, where is the dividing line?
  3. What would be a viable approach for this problem without using -@?

Note that the solution given here zip: Argument list too long (80.000 files in overall) will not work; I need to be archiving some, not all, of the files in the directory.

I'm running Mac OS 10.7.5. Here is some version info:

$ bash --version
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin11)
$ zip --version
This is Zip 3.0 (July 5th 2008), by Info-ZIP.
Compiled with gcc 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00) for Unix (Mac OS X) on Jun 24 2011.

Best Answer

First of all,

zip `find . -name "*pattern*.txt"`

is never a good idea. Filenames can contain spaces, newlines character, parts that could interpreted as switches and whatnot.

To perform an action for every found file, you can use the -exec switch or xargs.

find . -name "*pattern*.txt" -exec zip {} +

will add the files one by one to the zip file. Here, {} symbolizes the currently processed file.

Terminating the -exec argument with a + instead of ; causes find to process several file at once (as many as it can without generating the same errors you're getting), which should be considerably faster for a large number of files.

find . -name "*pattern*.txt" -print0 | xargs -0 zip

does essentially the same. xargs processes several files at once by default.

The -print0 switch to find and -0 switch to xargs make them use null characters as file separators to deal properly with strange filenames.

I don't know why the -@ isn't recommended for Mac OS1, but find ... | zip -@ will not handle strange filenames (specifically, filenames containing newline characters) properly. This is true regardless of the operating system.

1 I'm guessing this applies only to Mac OS up to version 9.x, since Mac OS used carriage returns as newline characters, while zip -@ expects linefeeds.