I'm trying to create a backup script. I've managed to get this script working fine on a CentOS 6.7 machine and am now trying to get it working properly on Debian 7.
I am running into a problem I can't seem to solve with Google or any of the information found on this site. I'll try to explain my situation before getting into the problem.
On CentOS, I use the following command to find files that have been changed in the past 24 hours in $SOURCEDIR
and use xargs
to put only these files into $ARCHIVE
. If no files are found a message pops up.
find $SOURCEDIR -mtime -1 -print | xargs -r tar rcvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }
I am aware that using tar rcvf
can invoke the following error message:
You may not specify more than one '-Acdtrux' or '–test-label' option
This however, does not seem to happen on the CentOS machine. It does on the Debian machine, thus I've removed the r
command from the tar
command. The reason I've added this in the first place is because I want to avoid the archive being overwritten if find
would return more than 100 results.
Now onto the actual problem. Whenever I run
find $SOURCEDIR -mtime -1 -print
I get a list of the files that have been changed in $SOURCEDIR
in the past 24 hours, as expected. However, whenever I run the complete command including the pipe symbol and the xargs
command like this:
find $SOURCEDIR -mtime -1 -print | xargs -r tar cvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }
I actually see the find
command print all files from $SOURCEDIR
before I end up with an archive including all the files from $SOURCEDIR
, and I do not understand why. Any help would be greatly appreciated.
Best Answer
As others have identified, the problem with your command is that it includes directories, and tar archives them recursively. If a directory has been modified recently, all the files in it and its subdirectories get included, whether they have been modified or not.
If you don't care to back up directory metadata, then just tell
find
not to print directory names. It isn't enough to omit the root: the same thing can happen with subdirectories too.Using xargs fails with file names containing spaces and some other special characters. This is easy to fix: use
-exec
instead ofxargs
.If you want to back up directory metadata, let
find
print everything and instead telltar
not to recurse into subdirectories. Sincefind
is doing the recursion,tar
doesn't need to.With this approach, you can avoid the use of
tar -rc
and instead solve the problem of repeated tar invocations by first creating an archive with only the root directory, and then appending to it in batches. (Why the root directory? Because GNU tar is afraid of creating an empty archive.)