We're setting up streaming replication with one master and one (read-only) slave. We're now looking for a way to monitor the replication; mostly to verify that the slave server is still up-to-date.
Is it possible to do this — preferably from the slave? If so, how?
The documentation on streaming replication mentions the following:
On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.
What does "promptly notice" mean in this context? Is there some error log where a broken connection gets reported?
Best Answer
For monitoring replica lag, there are several ways that give slightly different answers, depending on which version of Postgres you are using. A simple query that can be done directly on the standby is:
Note this check will not report "lag" in cases where, say, the connection to the primary has been severed, since this check of difference between XLOG segments received and applied doesn't know anything about XLOG segments which have not been received yet -- there are additional checks of
pg_stat_replication
on the primary which may be helpful as well, see e.g. here.In this context, "promptly notice" is talking about how much time may or must elapse before a TCP client (i.e. the standby) notices that its connection to the other side has been severed. (I think the docs should really say "helps the standby" not "helps the primary" here, since it is the standby which must detect a broken connection and re-establish a new connection to the primary.)
See also this relevant question talking about how keepalive settings affect this time.
I believe that you will see messages in the postmaster logs on both the primary and the standby side (particularly if you have
log_connections
/log_disconnections
enabled) when SR connections are dropped or re-established.