Postgresql – Monitoring streaming replication

monitoringpostgresqlreplication

We're setting up streaming replication with one master and one (read-only) slave. We're now looking for a way to monitor the replication; mostly to verify that the slave server is still up-to-date.

Is it possible to do this — preferably from the slave? If so, how?

The documentation on streaming replication mentions the following:

On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.

What does "promptly notice" mean in this context? Is there some error log where a broken connection gets reported?

Best Answer

We're now looking for a way to monitor the replication; mostly to verify that the slave server is still up-to-date.

For monitoring replica lag, there are several ways that give slightly different answers, depending on which version of Postgres you are using. A simple query that can be done directly on the standby is:

SELECT (CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
        THEN 0
        ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
        END) AS log_delay;

Note this check will not report "lag" in cases where, say, the connection to the primary has been severed, since this check of difference between XLOG segments received and applied doesn't know anything about XLOG segments which have not been received yet -- there are additional checks of pg_stat_replication on the primary which may be helpful as well, see e.g. here.

What does "promptly notice" mean in this context?

In this context, "promptly notice" is talking about how much time may or must elapse before a TCP client (i.e. the standby) notices that its connection to the other side has been severed. (I think the docs should really say "helps the standby" not "helps the primary" here, since it is the standby which must detect a broken connection and re-establish a new connection to the primary.)

See also this relevant question talking about how keepalive settings affect this time.

Is there some error log where a broken connection gets reported?

I believe that you will see messages in the postmaster logs on both the primary and the standby side (particularly if you have log_connections / log_disconnections enabled) when SR connections are dropped or re-established.