Postgresql – Monitoring streaming replication

monitoringpostgresqlreplication

We're setting up streaming replication with one master and one (read-only) slave. We're now looking for a way to monitor the replication; mostly to verify that the slave server is still up-to-date.

Is it possible to do this — preferably from the slave? If so, how?

The documentation on streaming replication mentions the following:

On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.

What does "promptly notice" mean in this context? Is there some error log where a broken connection gets reported?

Best Answer

We're now looking for a way to monitor the replication; mostly to verify that the slave server is still up-to-date.

For monitoring replica lag, there are several ways that give slightly different answers, depending on which version of Postgres you are using. A simple query that can be done directly on the standby is:

SELECT (CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
        THEN 0
        ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
        END) AS log_delay;

Note this check will not report "lag" in cases where, say, the connection to the primary has been severed, since this check of difference between XLOG segments received and applied doesn't know anything about XLOG segments which have not been received yet -- there are additional checks of pg_stat_replication on the primary which may be helpful as well, see e.g. here.

What does "promptly notice" mean in this context?

In this context, "promptly notice" is talking about how much time may or must elapse before a TCP client (i.e. the standby) notices that its connection to the other side has been severed. (I think the docs should really say "helps the standby" not "helps the primary" here, since it is the standby which must detect a broken connection and re-establish a new connection to the primary.)

See also this relevant question talking about how keepalive settings affect this time.

Is there some error log where a broken connection gets reported?

I believe that you will see messages in the postmaster logs on both the primary and the standby side (particularly if you have log_connections / log_disconnections enabled) when SR connections are dropped or re-established.

Related Solutions

Postgresql – Streaming Replication in PostgreSQL

PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.

So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.

Postgresql – Streaming replication – STONITH

well, you cannot switch back easily. what you can do is to use the old backup method:

SELECT pg_start_backup('whatever');
rsync the data dir
SELECT pg_stop_backup();

this is the best way as not all the data has to be transported over the network. i definitely recommend this replacement for pg_basebackup because it is the only RELIABLE way to get things back in order (nobody knows for sure what happens before the crash). sure, there might be other tricks out there but i would strongly vote for this method.

Best Answer

Related Solutions

Postgresql – Streaming Replication in PostgreSQL

Postgresql – Streaming replication – STONITH

Related Question