Postgresql – Postgres synchronized backup – what’s with the archive

database-recommendationdatabase-theorypostgresqlrecoverytransaction-log

I am setting up a Postgres 9.1 master and a hot standby-server. I read the documentation but I'm not quite sure yet what I should do with my WAL files.

Every commit is done on both servers before returning to the application, therefore archiving the WALs onto some third system seams useless. But image my hot backup server fails, then I will have to manually reconfigure the master instance not use any standyby server. Is it wise to activate the WAL archive only then?

I presume not, because the newly created WALs will be useless for any newly set up hot standby. Should I always have WAL archive active and from time to time do a complete snapshot backup of the running system and then delete the old archived WALs manually by their timestamp? Then I could quickly set up a new hot standy instance if the current hot standby instance fails? Or is it better just to let a new hot standby server catch up per TCP connection because it does some high throughput bulk transfer of the database and only switches to replaying stream WALs when it is nearly up to date? This would be great, because then I wouldn't have to care about WAL archiving at all?

If writing to the database doesn't work for a few hours I don't care, only reads have to be highly available.

Edit: I'm also interested in some details on how a new hot standby would catch up with the master database. Is everything done in transactions or is there some high speed TCP catchup involved, which would be similar (in runtime complexity) to just copying the disks.

Best Answer

Wal archive is useful for backup purposes. For example - you make full backup every day, and keep wal segments since last abckup, thanks to this you can always get database to any state that it was during the day (for example - just before someone did run: drop table users cascade;).