PostgreSQL – High Delay Lag Between Master/Slave in Postgres 9.3

postgresqlpostgresql-9.3recovery

I have a Postgres 9.3 database server with this configuration:

# grep archive_ postgresql.conf
archive_mode = on            # allows archiving to be done
archive_command = 'rsync -a %p postgres@backupserver:9.3/standby/archive/%f'            # command to use to archive a logfile segment                                                                                                                      
archive_timeout = 300            # force a logfile segment switch after this

and another server that follows in hot standby mode.

I expect a WAL packet to be sent when it becomes full or 5 minutes (300 seconds).

If I run this query in the hot standby:

SELECT CASE
WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END

Then I get a mean value of about 1500 seconds, that is about 5 times bigger than the configured 300 seconds.

Why?

In other servers with a lot of traffic the delay is small.

Related questions:
When will PostgreSQL execute archive_command to archive wal files?

Best Answer

In other servers with a lot of traffic the delay is small.

By this statement, I am assuming that this server does not have a lot of traffic.

If there are no transactions to replay, then the value of pg_last_xact_replay_timestamp() will not advance merely due to the passage of time. Also, archive_timeout will only kick in when there is something to archive. If there is no new WAL data since the last archive switch, then the existing WAL file (which is empty except for the initial header) will not be archived just because 5 minutes have passed.

A possible work around is to set checkpoint_timeout to 300 seconds as well. In 9.3 with this setting, the checkpoint will create records to be archived, and the archival will create work to be checkpointed, so you will generate a nearly empty file every 5 minutes on an idle server. Most people consider this to be a bug, but perhaps it is a feature for you.