PostgreSQL Streaming Replication – corruption on master

postgresqlreplication

We have PostgreSQL 9.3.9 streaming replication between Master DB and Slave DB.
Is there issues can happen on Master that can cause persistency problems or full failure of Master and Slave DB both while replicating? Am I always will get persistant and ready to work or restore Slave DB while replicating?

Best Answer

Is there issues can happen on Master that can cause persistency problems or full failure of Master and Slave DB both while replicating?

Yes, there are.

Anything that corrupts the transaction logs on the master before they're archived or streamed to the replica will do it - and I've seen that happen personally. The root cause was administrator error, in that the admin kill -9'd the postmaster, deleted postmaster.pid and then started a new postmaster up. Nonetheless, the corrupt WAL was replicated.

Memory errors that corrupt shared memory, flip a bit, etc could do it.

Severe filesystem problems affecting the transaction logs could do it.

A nasty PostgreSQL bug could do it. We actually had one of those in 9.3 too, though it's fixed prior to 9.3.9. To learn more than you ever wanted to about it, search for "multixact wrap-around".

This is why I strongly recommend that you keep point-in-time recovery using WAL archives and periodic base backups. The pgbarman utility helps with this.

Periodic text-format dumps can also be a good idea as extra insurance.

Am I always will get persistant and ready to work or restore Slave DB while replicating?

No, because some kinds of corruption can be replicated to the slave.

The most obvious one is if you DROP TABLE critical_production_data; the system will obediently replicate that to the slave before you can even say "Oh sh...".

Point in time recovery is important.