I have the exact same problem as Wal segment still exists on master, but logs on slave and master say it's been removed except that it is in Production !
PostgreSQL 9.4 / Debian Jessie
postgresql.conf:
wal_level = hot_standby
max_wal_senders = 5
pg_hba.conf
host replication repl x.x.x.0/20 trust
On the slave replica : Recovery.conf
standby_mode = on
primary_conninfo = 'host=<master_server_ip> port=xxxx user=xxxx password=xxxx'
trigger_file = '/var/lib/postgresql/trigger_failover'
Error on Master:
ERROR: requested WAL segment 000000030000007E00000054 has already been removed
Error on slave replica:
LOG: started streaming WAL from primary at 7E/54000000 on timeline 3
UTC [31252-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000030000007E00000054 has already been removed
What can I do to correct it without stopping the production ?
It is not working for several months now, so I guess there is a lot to recover. I can't even find the archives in ./postgresql/9.4/main/mnt/server/archivedir/
Best Answer
It sounds like the standby is now several months out of date. It's probably not useful at this point. You can replace the standby server with a new basebackup and start up the streaming again. You may want to look into using replication slots this time to ensure that WALs won't be prematurely removed from the primary.