Debian – ‘find -mtime -1 -print | xargs tar’ archives all files from directory ignoring the -mtime -1 argument

debianfindtarxargs

I'm trying to create a backup script. I've managed to get this script working fine on a CentOS 6.7 machine and am now trying to get it working properly on Debian 7.

I am running into a problem I can't seem to solve with Google or any of the information found on this site. I'll try to explain my situation before getting into the problem.

On CentOS, I use the following command to find files that have been changed in the past 24 hours in $SOURCEDIR and use xargs to put only these files into $ARCHIVE. If no files are found a message pops up.

find $SOURCEDIR -mtime -1 -print | xargs -r tar rcvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }

I am aware that using tar rcvf can invoke the following error message:

You may not specify more than one '-Acdtrux' or '–test-label' option

This however, does not seem to happen on the CentOS machine. It does on the Debian machine, thus I've removed the r command from the tar command. The reason I've added this in the first place is because I want to avoid the archive being overwritten if find would return more than 100 results.

Now onto the actual problem. Whenever I run

find $SOURCEDIR -mtime -1 -print

I get a list of the files that have been changed in $SOURCEDIR in the past 24 hours, as expected. However, whenever I run the complete command including the pipe symbol and the xargs command like this:

find $SOURCEDIR -mtime -1 -print | xargs -r tar cvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }

I actually see the find command print all files from $SOURCEDIR before I end up with an archive including all the files from $SOURCEDIR, and I do not understand why. Any help would be greatly appreciated.

Best Answer

As others have identified, the problem with your command is that it includes directories, and tar archives them recursively. If a directory has been modified recently, all the files in it and its subdirectories get included, whether they have been modified or not.

If you don't care to back up directory metadata, then just tell find not to print directory names. It isn't enough to omit the root: the same thing can happen with subdirectories too.

find "$SOURCEDIR" -mtime -1 ! -type d -print | xargs -r tar -rcf "$ARCHIVE"

Using xargs fails with file names containing spaces and some other special characters. This is easy to fix: use -exec instead of xargs.

find "$SOURCEDIR" -mtime -1 ! -type d -exec tar -rcf "$ARCHIVE" {} +

If you want to back up directory metadata, let find print everything and instead tell tar not to recurse into subdirectories. Since find is doing the recursion, tar doesn't need to.

find "$SOURCEDIR" -mtime -1 -exec tar -rcf "$ARCHIVE" --no-recursion {} +

With this approach, you can avoid the use of tar -rc and instead solve the problem of repeated tar invocations by first creating an archive with only the root directory, and then appending to it in batches. (Why the root directory? Because GNU tar is afraid of creating an empty archive.)

tar -cf "$ARCHIVE" --no-recursion "$SOURCEDIR"
find "$SOURCEDIR" -mindepth 1 -mtime -1 -exec tar -rf "$ARCHIVE" --no-recursion {} +
Related Question