PostgreSQL replication not working after base backup

linuxmaster-slave-replicationpostgresqlpostgresql-9.4replication

Last week we had an incident with our master PostgreSQL server instance which got totally thrashed so, we had to switch to our slave as our one and only DB instance. After the initial chaos was controlled, we tried to create a new slave replica from scratch, according to instructions left by our DB provider who actually set up the first master-slave replica, but just yesterday I realized that the slave wasn't really replicating at all and, only has the contents of the initial base backup, so we have no redundancy right now in case of a new disaster!

On recovery.conf we have:

standby_mode = on
primary_conninfo = 'host=10.1.1.65 port=5432 user=replicador password=XXXXXXXX'

In /var/lib/postgresql.log (master) there are several messages like this:

2017-03-17 12:21:57 UTC [1969-1] replicador@[unknown] ERROR:  requested WAL segment 0000000100000B280000003A has already been removed

And in the replica's log file, many messages like:

2017-03-22 13:22:32 UTC [2827-1] LOG:  started streaming WAL from primary at B28/3A000000 on timeline 1
2017-03-22 13:22:32 UTC [2827-2] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 0000000100000B280000003A has already been removed

Seems like the process wasn't fast enough to keep the master's pace. What parameters should I check/raise? Can I actually re-sync, or am I lost and once we'll have to do it all over again?

What could be missing? What additional info should I provide?

Environment:

select version()
PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

lsb_release -a
Distributor ID: Debian
Description:    Debian GNU/Linux 8.6 (jessie)
Release:    8.6
Codename:   jessie

uname -a
Linux ip-10-1-0-139 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

Best Answer

Your base backup looks old as current master (earlier slave) must be on timeline greater than 1 after getting promoted as master but your slave is on still timeline 1. You can either create a new slave from scratch after taking a new base backup from new master. Or you can try timeline switch in the current non replicating slave (not sure about it) refer it here http://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-timeline-switch-of-slave-node-without-archives/