Find – Recursively Search All Archive Files for Filename Patterns

7zfindrartarzip

At best I would like to have a call like this:

$searchtool /path/to/search/ -contained-file-name "*vacation*jpg"

… so that this tool

  • does a recursive scan of the given path
  • takes all files with supported archive formats which should at least be the "most common" like zip, rar, 7z, tar.bz, tar.gz …
  • and scan the file list of the archive for the name pattern in question (here *vacation*jpg)

I'm aware of how to use the find tool, tar, unzip and alike. I could combine these with a shell script but I'm looking for a simple solution that might be a shell one-liner or a dedicated tool (hints to GUI tools are welcome but my solution must be command line based).

Best Answer

(Adapted from How do I recursively grep through compressed archives?)

Install AVFS, a filesystem that provides transparent access inside archives. First run this command once to set up a view of your machine's filesystem in which you can access archives as if they were directories:

mountavfs

After this, if /path/to/archive.zip is a recognized archive, then ~/.avfs/path/to/archive.zip# is a directory that appears to contain the contents of the archive.

find ~/.avfs"$PWD" \( -name '*.7z' -o -name '*.zip' -o -name '*.tar.gz' -o -name '*.tgz' \) \
     -exec sh -c '
                  find "$0#" -name "*vacation*.jpg"
                 ' {} 'Test::Version' \;

Explanations:

  • Mount the AVFS filesystem.
  • Look for archive files in ~/.avfs$PWD, which is the AVFS view of the current directory.
  • For each archive, execute the specified shell snippet (with $0 = archive name and $1 = pattern to search).
  • $0# is the directory view of the archive $0.
  • {\} rather than {} is needed in case the outer find substitutes {} inside -exec ; arguments (some do it, some don't).

Or in zsh ≥4.3:

mountavfs
ls -l ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)(e\''
     reply=($REPLY\#/**/*vacation*.jpg(.N))
'\')

Explanations:

  • ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip) matches archives in the AVFS view of the current directory and its subdirectories.
  • PATTERN(e\''CODE'\') applies CODE to each match of PATTERN. The name of the matched file is in $REPLY. Setting the reply array turns the match into a list of names.
  • $REPLY\# is the directory view of the archive.
  • $REPLY\#/**/*vacation*.jpg matches *vacation*.jpg files in the archive.
  • The N glob qualifier makes the pattern expand to an empty list if there is no match.
Related Question