Postgresql – How to schedule PostgreSQL replication

amazon-rdspostgresqlreplicationscheduled-tasksscripting

I was reading postgresql replications solution but, even I just starting understaning how it works, another doubt has arrisen.
I'll be using postgres internal replication solution but as far as I understand, every event will be replicated just as it is achieved; for example, if I insert some data into the master, automatically it will be replicated to the slave…Am I right?

Well, is that so, I was searching for way of scheduling this replications as a passive backup but the thing is, I need for example, every monday on the night the database gets replicated with all the data it had inserted in it that day to a slave database in Amazon EC2 virtual servers. I was thinking on using cron as a first approach but reading postgres documentation I got aware that practically only configuring the necessary files, postgres do it for you…

So, how could I schedule replication in just desired moments?

PD.-The schedule may vary so it needs to be completely generic, also…I'm using Ubuntu 12.04 and PostgreSQL 9.1 as in the master as in the slave.
Thanks

Best Answer

You can use pg_basebackup to take a copy of the PostgreSQL database over the replication protocol. It copies the whole DB though, it isn't incremental. You'll want to use the --xlog-method=stream argument unless you have a shared WAL archive set up.

Streaming replication isn't really well suited to the purpose you describe. You can set it to have replication lag, but it's more minutes or hours rather than days, as the amount of WAL you have to retain on the master or in an archive to get days of replication lag is huge. It's also fixed, not schedulable.

You could achieve what you describe with lots of WAL archiving plus a custom-created recovery.conf for each backup, but it'll be fiddly at best.

The thing you need to realise is that by (say) Friday, the master no longer has any record of the state it was in on Monday. It can't update last Friday's snapshot to this Friday because it no longer knows what state it was in on intermediate days like last Monday ... or last Friday. It doesn't know what state the old replica is in, nor how to get it from there to where it is now.

You can tell it to remember that much state by archiving WAL segments, but you need an immense amount of disk space for that and it's very inefficient to catch the backup up from WAL if your DB is busy.

I'd recommend just using pg_basebackup and/or pg_dump on a schedule.