PostgreSQL warm standby: sync WAL files

data synchronizationpostgresqlstandbywrite-ahead-logging

I'm new to PostgreSQL (using ver 8.4.20) and need to setup a master and standby SQL server setup. I am planning to use warm standby setup. I have read the docs on PG and understand the basics, but a few things aren't clear:

  1. If I use log shipping, what program is responsible for moving the logs from the master to the standby? Do I need to setup an rsync job, or does PG take care of this automatically?

  2. If I wanted to keep control of this synchronizing (by using rsync), could I disable PG's synchronization (if such a thing exists)

  3. Assuming I need to setup an rsync job, do I sync the archive directory? or the main PG log directory?

  4. Is it possible to combine log shipping (to cover longer outages) and streaming/SR (for record level synchronization)? Is that a desirable thing?

  5. Is it possible to script something on the standby which tells me if it cannot sync with the master because too many old archive files are missing). So that I can send myself an alert, or run some other script.

Best Answer

If you use log shipping, then the archive_command on the master and the restore_command on the replica are responsible for copying the logs. The contents of those commands is up to you. You can use scp, rsync, tar, ssh, or just plain cp command to and from a shared network folder; or any number of other things. It is quite easy to come up with commands that generally work but fail in some edge cases, so you might want to use canned solutions that do it for you, like pg_barman or pgbackrest, as they are more likely to get the edge cases correct.

I don't know what your second question means.

For the 3rd, your archive directory generally shouldn't be on the same storage system as your database directory is, otherwise you have a single point of failure, which rather defeats the purpose. So your rsync command should go inside archive_command, so that each file is sent to where ever it is you want it to go as soon as it is ready for archival. If you do want your archive_command to copy the data to a local archive, than your hand-rolled rsync should definitely be rsyncing from that local archive directory, not pg_xlog.

It is possible to combine both streaming and log shipping. This is really useful if you also want to be able to do PITR (point-in-time-recovery), as then you need the WAL log archive anyway. If you are not interested in PITR, then you should probably just use streaming replication, with replication slots to cover long outages.