PostgreSQL Replication – When to Clean Up Archived WAL Files

postgresqlpostgresql-9.2replication

I need some help understanding the how archived WAL files are used and when I can clean them up.

Currently we have a master database replicating to a slave, but every so often our master database server runs out of disk space because the files archived using archive_command build up and we have to manually remove them. I feel like we're missing something if we have to manually remove old archived WAL files because there are too many that the file system runs out of disk space…

Having said that, I'm not sure I understand the replication process enough to start deleting files and be confident that I'm not screwing anything up. Does our slave use the archived files created by archive_command or does it rely only on the files in $PG_DATA/pg_xlog directory to maintain a consistent state with the master?

If that's the case, and we're doing a pg_dump of our master every six hours, could I update our backup script to also clear out all archive files that exist at the start of the pg_dump (once the pg_dump is complete of course)? from the docs:

Note: pg_dump and pg_dumpall do not produce file-system-level backups
and cannot be used as part of a continuous-archiving solution. Such
dumps are logical and do not contain enough information to be used by
WAL replay.

I just want to make sure that however we clean up the archived files, we still have what we need for replication to continue as well as be able to perform a PITR in since the last base backup.

Our archive command is cp %p /var/lib/postgresql/9.2/archive/%f and here is our recovery.conf file from the slave:

standby_mode = 'on'
primary_conninfo = 'host=<ip-address> port=<port> user=rep password=<password> sslmode=require'
trigger_file = '/tmp/postgresql.trigger.5432'

Best Answer

WAL Files are a record of all the transactions that have taken place .. but that's it - a delta between state before and state after, essentially. Which means you have to have a BASE to start off of before you can apply any WAL Files.

If you clear out WAL archives without creating a new BASE, then you will be in trouble when you try to restore your database (via PITR) from them.

If you are going to delete WAL files, then you need to make sure you have a valid base to build from again afterward.

Read the documentation on WAL archiving for more information.

In terms of streaming replication, I believe that only uses files located in pg_xlog... (files are not archived before they are sent, IIRC)

If that's the case, and we're doing a pg_dump of our master every six hours, could I update our backup script to also clear out all archive files that exist at the start of the pg_dump (once the pg_dump is complete of course)?

Not entirely ... pg_dump creates a file that contains the information of your whole database that can be given to pg_restore... For PITR, you actually need to create a file-level "backup" by copying your database files and WAL files (after issuing a pg_start_backup)