PostgreSQL Incremental Backup with wal-e – How to Guide

backuppostgresql

I am new to postgresql. I've connected to my demo database with psycopg2 and inserted data with python faker. I want to take incremental backup with the tool WAL-E and want to see how it actually works. But the few tutorials aren't helping that much as they're not for naive users.

Can someone help explaining the steps simply how to backup data with WAL-E? That would be a great help ! I'm using psql 10.4 in Ubuntu 18.04.

Thanks

Best Answer

If you understand the base backup and WAL archives in Postgresql, then you would have understood the WAL-E.

Base Backup:

Its also called as the Hot backup, this will not interrupt any other transaction inside your PostgreSQL. Once you initiated the basebackup it'll wait for a checkpoint. If the checkpoint is entered, then it'll copy the entire data directory to a file. And these backups are the starting points for replication or POINT IN TIME recovery. Generally, basebackups are not consistent and may have corrupted data. Here the WAL files have come into the picture.

WAL:

WAL files will get all data changes which are happening inside the database. These changes are made in the WAL files first and then actual database after that. Thats why its called as Write Ahead Log.

What is consistent backup:

consistent backup = basebackup + WAL Files.

Once the bacebackup may or may not have the committed data. So WAL files are will redo or undo the particular transaction to make the data consistent. WAL files must be applied after restoring the basebackup. This WAL files should be mentioned in restore_command in the recovery file.

WAL-E:

Its an OpenSource PostgreSQL database backup tool which will upload your backups to S3, GCP storage and Azure (+ some other places). Here also it'll perform the basebackup first.

I'll explain this functions with S3.

backup-push:

This will initiate the basebackup and upload the backup file to s3.

backup-fetch:

Restore the basebackup from the S3. LATEST will restore the most recent backup.

wal-push:

It'll get the WAL file using the archive_command and store into the S3.

wal-fetch:

It'll restore your WAL files from S3.

Recovery:

If your primary database is corrupted and you want to restore it from WAL-E,

envdir /etc/wal-e.d/fetch-env wal-e               \
--s3-prefix=s3://some-bucket/directory/or/whatever  \
backup-fetch /var/lib/my/database LATEST

Now use recovery.conf file to restore your WAL files.

restore_command = 'envdir /etc/wal-e.d/env wal-e wal-fetch %f %p'
standby_mode = on

Or, if you want to stop the restore on a particular time, use

recovery_target_time = '2017-02-01 19:58:55'

Then start the PostgreSQL and it'll restore your databases to a consistent state.

How to backup using WAL-E:

Schedule a cronjob for this. This blog post explains this clearly.

Don't wrongly understand the WAL-E with Replication:

WAL-E will push the log files when Postgresql kicks out the WAL logs from the pg_XLOG folder. This will archive based on keep_wal_segments. So this will not restore upto the last commit, but it'll so upto last archived WAL file. Its mainly used for DR not for HA.