Incremental file back up by date

backuprsyncstoragesynchronization

Is there any backup tool that supports partial back up by date? What I mean is that I have some small drives such as on a laptop that can't hold a full copy of all the files on an external 2TB drive. I want to keep the latest files, which are usually the ones I'm working with, on the laptop. But at the same time I want to be able to synchronize other drives with limited storage space. So each small drive should contain files that have been created or modified beyond a certain date, such as 3 months ago. Say I have 2 laptops and each has a different disk size. For the laptop with a smaller drive, I may only want files from the last month to be present. All files on both laptops should be backed up on the large external drive, along with all the other archived files that are greater than 3 months old. So the external drive should mirror both smaller drives and keep them in sync. I've tried using unison but it doesn't seem to support back up by date. Maybe rsync with a few shell scripts could work, but I want to check first if a solution exists before implementing a new one.

Best Answer

I haven't used any dedicated programs for this, but it is quite easy to organize and fine tune with a combination of cron, bash, tar (incremental dumps) and/or rsync. In my mind, there are two optimal solutions, and I use both or one of them depending on the context. I think the first will be more appropriate for you, but I'll describe both here.

Incremental tar archives

The core of this solution is a script that might look something like this:

#!/bin/bash

# You will need to set the variables $EXCLUDE, $DATA and $BACKUPS
# as environment variables, in ~/.bashrc or somewhere.

OPTS="--create --no-check-device --bzip2 --verbose -X $EXCLUDE"
for d in `ls $DATA`; do
    SNAPSHOT=$BACKUPS/$d.snar
    if [ $1 == full ]; then
        echo "Archiving $d (full)..."
        rm -rvf $SNAPSHOT
        ARCHIVE=$DATA/$d.`date --iso-8601`.full.tar.bz2
        tar $OPTS --file=$ARCHIVE --listed-incremental=$SNAPSHOT $DATA/$d
    fi
    if [ $1 == increment ]; then
        echo "Archiving data/$d (increment)..."
        ARCHIVE=$DATA/$d.`date --iso-8601`.tar.bz2
        tar $OPTS --file=$ARCHIVE --listed-incremental=$SNAPSHOT $DATA/$d
    fi
done

This assumes there are subdirectories in $DATA and backups each one in a separate archive. If your setup is different, customize the script.

You can schedule the backup in your crontab like so:

# m  h  dom mon dow   command
  44 1  1   */2 *     ~/bin/backup_data full > ~/backups/data/logs/`date --iso-8601`.full.log 2>&1
  44 5  *   *   *     ~/bin/backup_data increment > ~/backups/data/logs/`date --iso-8601`.log 2>&1

As you can see, in this case a full backup is created once every two months, and incremental backups starting from that full dump are created every day. Problems with incremental archives in tar start when you lose one file or even change a timestamp. So, it is prudent to create a full dump once in a while.

As far as synchronizing between machines and removing old files, you should separate that task from the backing up itself, since it really is orthogonal. Of course, use rsync for synchronization, without the --delete option so that you don't lose any data on the large external drive. So your command for that might be:

rsync -av /backups/data /mnt/external

if the external drive is mounted on the laptop. Otherwise, you will need to do it over the network like so:

rsync -av /backups/data user@external:/backups/data

If you want to clean archives older than 90 days from your laptop, you can do so like this:

find /path/to/files -type f -mtime +90 -delete

Again, put these things in your crontab.

Incremental backups with rsync

You can use rsync alone to incrementally backup things. I especially like using timestamped snapshots and hardlinks for that, and it is just one command. Here is an example close to what I normally use:

rsync --verbose --progress --stats --human-readable --archive --link-dest=/backups/data/`date --iso-8601 -d "one day ago"` /data/ /backups/data/`date --iso-8601`/

which basically creates hard links to the snapshot from the previous day (the one given by --link-dest) for files that have not changed. If you will be running irregularly, you can use a symbolic link that points to the latest snapshot, and update that symlink after backup, like so:

rsync --verbose --progress --stats --human-readable --archive --link-dest=/backups/data/last /data/ /backups/data/`date --iso-8601`/ && rm -rvf /backups/data/last && ln -vs /backups/data/`date --iso-8601`/ /backups/data/last

On top of this, you will need to organize the synchronization with the external drive and delete old snapshots. This, generally, is done the same way as in the first solution I outlined above. However, when rsyncing snapshots between machines, make sure to use the -H options to preserve the hard links.

Summary

Compared to the solution using tar, the second one in my mind is somewhat simpler to manage and has all the files available at all times. Using archives, on the other hand, makes use of compression, uses fewer inodes, and has other pros on non-server machines.

Again, do all this in crontab whenever possible, so you don't have to remember about it. If you don't have the laptop turned on all the time, choose a time when it is often used, and perhaps do it several times a day so that at least some of the cron jobs start. Better yet, use something like anacron.

You can also run the backup script by hand , and fine-grain the dates in the filenames/directories if you want to do incrementals more than once each day. Obviously, you will need to play around with these solutions to make them fit your use case.

Update: a repository with an example script I use: https://github.com/langner/backup.sh/blob/master/backup.sh

Related Question