Postgresql – How to monitor PostgreSQL WAL shipping if I’m not using streaming replication

log-shippingpostgresqlreplication

We have a fairly simple setup to replicate from our on-premise master PostgreSQL database to our presentation tier in AWS. We're using WAL shipping using the archive_command setting. Basically the setup looks like this:

    +-------------+
    |   Master    |
    +-------------+
     WAL  |     
segments  |              
         \|/                     +--------------+
    +-------------+   WAL      +-+------------+ |      
    |             |----------->| Hot Standby  | |
    |      S3     |  segments  |   Slaves     | |
    |             |            |              |-+
    +-------------+            +--------------+

This setup seems to generally be fairly robust, but I haven't come up with a good way to detect failures, either the master failing to push archives up or a slave or slaves failing to retrieve the log files. What's a good way to determine if a slave is up-to-date with respect to master? What's a good way to determine if the master has failed to ship a WAL file?

Just to clarify, we are using the slaves strictly as read replicas, we will never failover to them.

Best Answer

Here are a few ways:

  • Compare pg_current_xlog_location() on the primary with pg_last_xlog_replay_location() on the standby. That will give you the lag in bytes, which might not be very useful for alerting, but it can be useful to chart it.
  • Monitor pg_last_xact_replay_timestamp() on the standby against the current time.
  • Have a cron job on the primary periodically modify a value and then check how long it takes to get to the standby. That's essentially how it used to be done before pg_last_xact_replay_timestamp() became available.