Does anyone know if there's a way to resend or regenerate a WAL file that was missed by the standby server? During my testing I've gotten into a situation where the main server fails to send a file or 2 due to some issue on the standby server. Once I correct the problem then the main server is able to send files just fine except it does not attempt to resend the files it missed and so the standby server gets "stuck" waiting for file x. In the mean time WAL files are stacking up in my wal archive directory on the standby server, they don't get processed because postgresql is waiting for a specific file.
So what I'd like to know is if there's a way to resend and/or regenerate this file on the primary server. If so, how? Or is that data lost? and if so, how do I tell the standby server to stop waiting for that file and move on to the next?
I'm using omnipitr-archive and omnipitr-restore commands for my archiving and restoring.
Best Answer
Your
archive_command
is buggy.A correct archive command will never return zero (success) until the WAL of interest is safely archived. If the command returns non-zero (failure) then PostgreSQL will retry archival that WAL segment until it succeeds and will not discard the WAL segment until it's archived.
Either your
archive_command
is returning true before the archive is truly safely stored, or there's something wrong with therestore_command
that's causing it to lose/remove WAL segments.What you describe should not happen in normal, correct operation.
Now, if somehow you do land up with a gap of WAL segments between the standby's current position and the oldest available WAL segment, the only thing you can do is re-initialise the standby. The simplest and safest way to do this is with
pg_basebackup -X stream
. If you want to get more sophisticated and save some time/bandwidth you can instead use rsync to copy just changed parts of files:pg_start_backup()
on the masterrsync -a --delete
the master data dir to the standby data dirpg_stop_backup()
on the master