Tar “–exclude-from” double star wildcard

tarwildcards

I wrote a backup script on my Debian 8 system which uses tar command with "–exclude-from" to exclude some files/dir.

This works great, but today I would like to exclude some files sharing the same path pattern, like /home/www-data/sites/<some_string>log.txt or directories like /home/www-data/sites/<one_or_two_directories>/vendor.

I tried to append /home/www-data/sites/*log.txt into the file, but tar fails and outputs on stderr the following errors:

tar: /home/www-data/sites/*log.txt: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

Did I miss something when trying to use * or ** ?

I then read that in Unix, programs usually do not interpret wildcards themselves which means that * isn't expanded neither ** by tar.

As far as I know, my last resort here is to expand the list using bash and append it into the exclusion file (if it's not already there) prior to the tar call. Is there a cleaner way?

EDIT

Here is the script snippet ..

# ...
broot=$(dirname "${PWD}")
i="${PWD}/list.include"
x="${PWD}/list.exclude"
o="$broot/archive.tgz"
tar -zpcf $o -T $i -X $x
# ...

Dans here is the exclusion file ..

/etc/php5/fpm
/etc/nginx
/etc/mysql
/home/me/websites/*log.txt
/home/me/websites/**/vendor

The goal is to exclude all log files located inside "websites" directory, and, all "vendor" directories that could be found in any subdirectories of "websites".

Thank you !

Best Answer

The shell expands wildcards in arguments, so most applications don't need to perform any wildcard expansion. However tar's exclude list does support wildcards, which happen to match the wildcards supported by traditional shells. Beware that there may be slight differences; for example tar doesn't distinguish * and ** like ksh, bash and zsh can. With tar, * can match any character including /, so for example */.svn excludes a file called .svn at any level of the hierarchy. You can use tar --no-wildcards-match-slash in which case * doesn't match directory separators.

For example, excluding /home/me/websites/*log.txt excludes /home/me/websites/log.txt, /home/me/websites/foo-log.txt and /home/me/websites/subdir/log.txt. Excluding /home/me/websites/**/vendor excludes /home/me/websites/one/vendor and /home/me/websites/one/two/vendor but not /home/me/websites/vendor. With the --no-wildcards-match-slash option, /home/me/websites/*log.txt does not exclude /home/me/websites/subdir/log.txt and /home/me/websites/**/vendor does not exclude /home/me/websites/one/two/vendor.

tar … --exclude='/home/www-data/sites/*include' … excludes the files and directories under /home/www-data/sites whose name ends with include. You might get away without the quotes, but not if you write --exclude /home/www-data/sites/*include (because then the shell would expand the wildcards before tar can see them) or if you use a shell that signals an error on non-matching wildcards (e.g. zsh in its default — and recommended — configuration).

The option --exclude-from requires a file name. The file must contain one pattern per line. Do not confuse --exclude (followed by a pattern) and --exclude-from (followed by the name of a file containing patterns).

Related Question