Postgresql – Wal segment still exists on master, but logs on slave and master say it’s been removed

postgresqlreplication

I'm trying to set up streaming replication on Postgresql 9.5

The master and slave are configured as below and WAL files are accumulating on the master. However, something is wrong as I get complaints that the WAL files are missing:

Slave:

FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 0000000200000059000000BA has already been removed

Master:

repuser@[unknown] ERROR: requested WAL segment 0000000200000059000000BA has already been removed

The WAL file does exist on the master, and the slave will restore happily if I ship the WAL files over and use the restore_command option in recovery.conf.

postgres$ ls -l /db/archivedir/0000000200000059000000BA
-rw------- 1 postgres postgres 16777216 Mar 31 10:18 /db/archivedir/0000000200000059000000BA

Master config – postgresql.conf:

wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f /db/archivedir/%f && cp %p /db/archivedir/%f'
max_wal_senders = 3

The master also has a replication slot configured:

brp=# SELECT * FROM pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn 
----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------
brp_uk    |        | physical  |        |          | f      |            |      |              | 

Slave – postgresql.conf:

hot_standby = on

Slave config – recovery.conf:

standby_mode = on
primary_conninfo = 'host=xxx port=5434 user=repuser password=xxx'
trigger_file = '/tmp/postgresql.trigger.5434'
primary_slot_name = 'brp_uk'

Then the pg_basebackup is run and the slave started.

The slave has all the data as of the time of the backup, but no new data from the WAL files, and the error above.

What have I mis-configured?

Best Answer

It is as dezso alluded to in the comment. The reason you get the error is because the WAL has already been archived.

You can configure how many segments to store on the master with the wal_keep_segments configuration. You should set it to at least the amount of WAL's created during the base backup (and some extra as buffer).

Since the standby doesn't connect to the replication slot until it is ready to use it (which is after the base backup is finished), it means that the slot has not been activated yet and therefore is not storing the old WAL's.