PostgreSQL streaming replication lag for synchronous standbys

postgresqlpostgresql-9.5replication

We use PostgreSQL 9.5 in a master + synchronous hot-standby setup, i.e., using synchronous_commit = on and using synchronous_standby_names='*'.

Consequently, the standby has sync_state = sync in the master's pg_stat_replication_table.

We also use postgres_exporter, which uses the following statement to extract the system's replication lag from the master's stats.

SELECT *, pg_current_xlog_location(),
       pg_xlog_location_diff(pg_current_xlog_location(), replay_location)::float
FROM pg_stat_replication;

Question: Does the above call to pg_xlog_location_diff always return 0 in a synchronous setup, or are there any cases when there is a measurable replication lag?

To my understanding, if the above statement shows a lag, it means that the master has committed a transaction without waiting for the standby, which should never happen in a synchronous setup.

Note that we moreover use replication slots, have setup the cluster using repmgr, and do not override synchronous_commit locally for individual transactions.

Best Answer

The pg_xlog_location_diff within your query may return more than 0.

There are three writing timing of WAL data to WAL segment file: 1) When a transaction is committed, 2) When WAL buffer is filled up, 3) When WAL writer process writes periodically (default is 200 milliseconds).

And basic sequence of streaming replication is shown below:

  1. (master) Write WAL data.
  2. (master) Send WAL data to a slave.
  3. (slave) Receive WAL data and replay it.
  4. (slave) Return some information (e.g. replay location, receive location, flushed location.)

I show an extreme example to clarify the problem. If you run a long transaction to insert huge data, master continues to send WAL data till the transaction is committed. When you issue your query during the period from 2 to 3, pg_current_xlog_location() returns the current location whereas replay_location returns the previous replay location, pg_xlog_location_diff(pg_current_xlog_location(), replay_location) therefore returns more than 0.

Other case is the following: The configuration parameter max_standby_*_delay is enable and a conflict occurs, your standby suspends replaying WAL data, therefore pg_xlog_location_diff() returns more than 0.

(Both of examples are logical extreme cases.)

In any case, I think your problem does not depend on whether replication mode is sync or async. The difference between sync and async is that the master confirms a synchronous slave commits transactions or not. Your problem is related to general sequence of WAL data sending and replaying, but is not depend on the synchronous mode.

(Consider how a long transaction is committed in synchronous mode. I discussed what happens when your query is issued during the period from 2 to 3 (it's very short period) in the streaming replication sequence I showed. I did not discussed when transaction is committed because it is not essential of the answer, i think.)