Postgresql – Replication has failed; how to get going on again

postgresql-9.1

I'm running Postgres 9.1.6 on Ubuntu and I have streaming replication setup between a master and slave. Everything has been running smoothly until the database crashed and we had to restart both of the boxes.

Now, replication has stopped and when checking the logs on both boxes, I see this message:

CDT FATAL: requested WAL segment 0000000100000224000000FA has already been removed

It's the same segment over and over again. From my Googling, it would seem that the replication server is trying to retrieve this segment from the master, but it's not there anymore. Ok, but how to get around this? Do I need to make a fresh backup and rsync that over to the slave? Is there an easy way to get the slave back in sync?

Best Answer

Yes you will have to give the slave a new base backup (for streaming replication only steps 1 to 4) of the master.

Your problem has probably occured because the value of wal_keep_segments is to low. The value needs to be high enough that when the slave is down for some time the master won't start recycling segments the slave hasn't processed yet.