You can use pg_basebackup
to take a copy of the PostgreSQL database over the replication protocol. It copies the whole DB though, it isn't incremental. You'll want to use the --xlog-method=stream
argument unless you have a shared WAL archive set up.
Streaming replication isn't really well suited to the purpose you describe. You can set it to have replication lag, but it's more minutes or hours rather than days, as the amount of WAL you have to retain on the master or in an archive to get days of replication lag is huge. It's also fixed, not schedulable.
You could achieve what you describe with lots of WAL archiving plus a custom-created recovery.conf for each backup, but it'll be fiddly at best.
The thing you need to realise is that by (say) Friday, the master no longer has any record of the state it was in on Monday. It can't update last Friday's snapshot to this Friday because it no longer knows what state it was in on intermediate days like last Monday ... or last Friday. It doesn't know what state the old replica is in, nor how to get it from there to where it is now.
You can tell it to remember that much state by archiving WAL segments, but you need an immense amount of disk space for that and it's very inefficient to catch the backup up from WAL if your DB is busy.
I'd recommend just using pg_basebackup
and/or pg_dump
on a schedule.
This is a really broad topic, and not one I am going to cover in detail here. One point to note however: in my experience, SR can occasionally break due to network issues between datacenters (or ever servers), so disabling log shipping isn't a good idea unless you truly don't have the bandwidth to spare between master and slave(s).
Best Answer
Streaming replication cannot do this (but see Craig Ringer's comment above about a possibly upcoming feature).
The good news is, if you make the standby getting the WAL files and apply them, you can apply an arbitrary delay. We push the WALs to a NFS volume and the
restore command
copies them from there to the local WAL dir, checking if the files are old enough. Then the normal restore process applies them as usual.Of course, the delay will match the preset value only approximately, as a single WAL file may contain changes from quite different time intervals, depending on the master's workload.
What you propose with pausing the restore process depending on the value returned by
pg_last_xact_replay_timestamp()
may work if run from a scheduled job (probably you want to schedule it to run quite frequently). So far I haven't seen such a setup and am wondering if it can be a production-grade solution.