Postgresql – Error on PostgreSQL streaming replication

postgresqlreplication

I have the exact same problem as Wal segment still exists on master, but logs on slave and master say it's been removed except that it is in Production !

PostgreSQL 9.4 / Debian Jessie

postgresql.conf:

wal_level = hot_standby
max_wal_senders = 5

pg_hba.conf

host    replication     repl        x.x.x.0/20               trust

On the slave replica : Recovery.conf

standby_mode = on
primary_conninfo = 'host=<master_server_ip> port=xxxx user=xxxx password=xxxx'
trigger_file = '/var/lib/postgresql/trigger_failover'

Error on Master:

ERROR:  requested WAL segment 000000030000007E00000054 has already been removed

Error on slave replica:

LOG:  started streaming WAL from primary at 7E/54000000 on timeline 3
UTC [31252-2] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000030000007E00000054 has already been removed

What can I do to correct it without stopping the production ?

It is not working for several months now, so I guess there is a lot to recover. I can't even find the archives in ./postgresql/9.4/main/mnt/server/archivedir/

Best Answer

It sounds like the standby is now several months out of date. It's probably not useful at this point. You can replace the standby server with a new basebackup and start up the streaming again. You may want to look into using replication slots this time to ensure that WALs won't be prematurely removed from the primary.