Linux – Best Backup Utility for Incremental Backups

backuplinuxsoftware-rec

I'm looking for a backup utility with incremental backups, but in a more complicated way.

I tried rsync, but it doesn't seem to be able to do what I want, or more likely, I don't know how to make it do that.

So this is an example of what I want to achieve with it.
I have the following files:

testdir
├── picture1
├── randomfile1
├── randomfile2
└── textfile1

I want to run the backup utility and basically create an archive (or a tarball) of all of these files in a different directory:

$ mystery-command testdir/ testbak
testbak
└── 2020-02-16--05-10-45--testdir.tar

Now, let's say the following day, I add a file, such that my structure looks like:

testdir
├── picture1
├── randomfile1
├── randomfile2
├── randomfile3
└── textfile1

Now when I run the mystery command, I will get another tarball for that day:

$ mystery-command testdir/ testbak
testbak
├── 2020-02-16--05-10-45--testdir.tar
└── 2020-02-17--03-24-16--testdir.tar

Here's the kicker: I want the backup utility to detect the fact that picture1, randomfile1, randomfile2 and textfile1 have not been changed since last backup, and only backup the new/changed files, which in this case is randomfile3, such that:

tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar 
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar 
testdir/randomfile3

So as a last example, let's say the next day I changed textfile1, and added picture2 and picture3:

$ mystery-command testdir/ testbak
testbak/
├── 2020-02-16--05-10-45--testdir.tar
├── 2020-02-17--03-24-16--testdir.tar
└── 2020-02-18--01-54-41--testdir.tar
tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar 
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar 
testdir/randomfile3
tester@raspberrypi:~ $ tar -tf testbak/2020-02-18--01-54-41--testdir.tar 
testdir/textfile1
testdir/picture2
testdir/picture3

With this system, I would save space by only backing up the incremental changes between each backup (with obviously the master backup that has all the initial files), and I would have backups of the incremental changes, so for example if I made a change on day 2, and changed the same thing again on day 3, I can still get the file with the change from day 2, but before the change from day 3.

I think it's kinda like how GitHub works 🙂

I know I could probably create a script that runs a diff and then selects the files to backup based on the result (or more efficiently, just get a checksum and compare), but I want to know if there's any utility that can do this a tad easier 🙂

Best Answer

Update:

Please see some caveats here: Is it possible to use tar for full system backups?

According to that answer, restoration of incremental backups with tar is prone to errors and should be avoided. Do not use the below method unless you're absolutely sure you can recover your data when you need it.


According to the documentation you can use the -g/--listed-incremental option to create incremental tar files, eg.

tar -cg data.inc -f DATE-data.tar /path/to/data

Then next time do something like

tar -cg data.inc -f NEWDATE-data.tar /path/to/data

Where data.inc is your incremental metadata, and DATE-data.tar are your incremental archives.

Related Question