Postgresql – Replica is stuck waiting for

postgresqlreplication

Running two PostgreSQL 9.2.3 instances. One is master and other is slave.

All was working for awhile, I could make changes on master and see them on the slave but now my slave is not getting changes any more.

1000     25069  0.2  0.1 131616 11280 pts/0    S    01:02   0:00 /usr/local/pgsql/bin/postmaster -D /mnt/pgdb/v9.2.3 -i
1000     25074  0.0  0.0 131724  1440 ?        Ss   01:02   0:00 postgres: startup process   waiting for 000000040000000700000083

I've tried countless things including completely blowing away the slave, running initdb and rsyncing the master back to the slave, launching the slave only to find it still stuck in this same dam state.

What the hell am I doing wrong here?

Slave's recovery.conf looks as follows:

standby_mode = 'on'
primary_conninfo = 'host=172.16.0.14'
trigger_file = '/tmp/pgfailover'
restore_command = 'cd .'

I've also tried the following on the master to no avail:

$PGDIR/psql -U $UNAME -d postgres -c "select pg_start_backup('clone',true);"
rsync -av --exclude pg_xlog --exclude postgresql.conf /mnt/pgdb/current/* 172.16.0.8:/mnt/pgdb/current/
$PGDIR/psql -U $UNAME -d postgres -c "select pg_stop_backup();"

UPDATE – Including relevant information pertaining to replicate problem.

postgresql.conf on the Master node

wal_level = hot_standby
archive_mode = on
archive_command = 'cd .'

max_wal_senders = 3
wal_keep_segments = 5000
hot_standby = on
hot_standby_feedback = on

The Slave Node's postgresql.conf has these settings:

wal_level = hot_standby
archive_mode = on
archive_command = 'cd .'

max_wal_senders = 3
hot_standby = on
hot_standby_feedback = on

Best Answer

It looks like the standby doesn't successfully connect to the master.

In order to be able to use the streaming replication, the master needs to have enough WALs in the streaming buffer that it can send to the standby. Since you're not using log shipping, it needs to store all the logs from when you started the backup to when the standby has been started.

You can control the number of WAL's stored in the buffer by setting wal_keep_segments to a high enough value.

You should also make sure that max_wal_senders is set to the number of standby servers you have.

However, it is always best to have the WAL restore set as a backup for when the streaming replication doesn't work for any reason. You can achieve that by setting the archive_command in postgresql.conf and the restore_command in recovery.conf

For instance like this, assuming that /pgsql/backups/archive_logs is mounted as a nfs share accessible on both servers:

archive_command = 'cp %p /pgsql/backups/archive_logs/%f'
restore_command = 'cp /pgsql/backups/archive_logs/%f %p'