Tar Command – Why tar –exclude Doesn’t Exclude?

tar

I have this very simple line in a bash script which executes successfully (i.e. producing the _data.tar file), except that it doesn't exclude the sub-directories it is told exclude via the --exclude option:

/bin/tar -cf /home/_data.tar  --exclude='/data/sub1/*'  --exclude='/data/sub2/*' --exclude='/data/sub3/*'  --exclude='/data/sub4/*'  --exclude='/data/sub5/*'  /data

Instead, it produces a _data.tar file that contains everything under /data, including the files in the subdirectories I wanted to exclude.

Any idea why? and how to fix this?

Update I implemented my observations based on the link provided in the first answer below (top level dir first, no whitespace after last exclude):

/bin/tar -cf /home/_data.tar  /data  --exclude='/data/sub1/*'  --exclude='/data/sub2/*'  --exclude='/data/sub3/*'  --exclude='/data/sub4/*'  --exclude='/data/sub5/*'

But that didn't help. All "excluded" sub-directories are present in the resulting _data.tar file.

This is puzzling. Whether this is a bug in current tar (GNU tar 1.23, on a CentOS 6.2, Linux 2.6.32) or "extreme sensitivity" of tar to whitespaces and other easy-to-miss typos, I consider this a bug. For now.

This is horrible: I tried the insight suggested below (no trailing /*) and it still doesn't work in the production script:

/bin/tar -cf /home/_data.tar  /data  --exclude='/data/sub1'  --exclude='/data/sub2'  --exclude='/data/sub3'  --exclude='/data/sub4'

I can't see any difference between what I tried and what @Richard Perrin tried, except for the quotes and 2 spaces instead of 1. I am going to try this (must wait for the nightly script to run as the directory to be backed up is huge) and report back.

/bin/tar -cf /home/_data.tar  /data --exclude=/data/sub1 --exclude=/data/sub2 --exclude=/data/sub3 --exclude=/data/sub4

I am beginning to think that all these tar --exclude sensitivities aren't tar's but something in my environment, but then what could that be?

It worked! The last variation tried (no single-quotes and single-space instead of double-space between the --excludes) tested working. Weird but accepting.

Unbelievable! It turns out that an older version of tar (1.15.1) would only exclude if the top-level dir is last on the command line. This is the exact opposite of how version 1.23 requires. FYI.

Best Answer

If you want to exclude an entire directory, your pattern should match that directory, not files within it. Use --exclude=/data/sub1 instead of --exclude='/data/sub1/*'

Be careful with quoting the patterns to protect them from shell expansion.

See this example, with trouble in the final invocation:

$ for i in 0 1 2; do mkdir -p /tmp/data/sub$i; echo foo > /tmp/data/sub$i/foo; done
$ find /tmp/data
/tmp/data
/tmp/data/sub2
/tmp/data/sub2/foo
/tmp/data/sub0
/tmp/data/sub0/foo
/tmp/data/sub1
/tmp/data/sub1/foo
$ tar -zvcf /tmp/_data.tar /tmp/data --exclude='/tmp/data/sub[1-2]'
tar: Removing leading `/' from member names
/tmp/data/
/tmp/data/sub0/
/tmp/data/sub0/foo
$ tar -zvcf /tmp/_data.tar /tmp/data --exclude=/tmp/data/sub[1-2]
tar: Removing leading `/' from member names
/tmp/data/
/tmp/data/sub0/
/tmp/data/sub0/foo
$ echo tar -zvcf /tmp/_data.tar /tmp/data --exclude=/tmp/data/sub[1-2]
tar -zvcf /tmp/_data.tar /tmp/data --exclude=/tmp/data/sub[1-2]
$ tar -zvcf /tmp/_data.tar /tmp/data --exclude /tmp/data/sub[1-2]
tar: Removing leading `/' from member names
/tmp/data/
/tmp/data/sub2/
/tmp/data/sub2/foo
/tmp/data/sub0/
/tmp/data/sub0/foo
/tmp/data/sub2/
tar: Removing leading `/' from hard link targets
/tmp/data/sub2/foo
$ echo tar -zvcf /tmp/_data.tar /tmp/data --exclude /tmp/data/sub[1-2]
tar -zvcf /tmp/_data.tar /tmp/data --exclude /tmp/data/sub1 /tmp/data/sub2
Related Question