We use PostgreSQL 9.5 in a master + synchronous hot-standby setup, i.e., using synchronous_commit = on
and using synchronous_standby_names='*'
.
Consequently, the standby has sync_state = sync
in the master's pg_stat_replication_table
.
We also use postgres_exporter, which uses the following statement to extract the system's replication lag from the master's stats.
SELECT *, pg_current_xlog_location(),
pg_xlog_location_diff(pg_current_xlog_location(), replay_location)::float
FROM pg_stat_replication;
Question: Does the above call to pg_xlog_location_diff
always return 0
in a synchronous setup, or are there any cases when there is a measurable replication lag?
To my understanding, if the above statement shows a lag, it means that the master has committed a transaction without waiting for the standby, which should never happen in a synchronous setup.
Note that we moreover use replication slots, have setup the cluster using repmgr
, and do not override synchronous_commit
locally for individual transactions.
Best Answer
The pg_xlog_location_diff within your query may return more than 0.
There are three writing timing of WAL data to WAL segment file: 1) When a transaction is committed, 2) When WAL buffer is filled up, 3) When WAL writer process writes periodically (default is 200 milliseconds).
And basic sequence of streaming replication is shown below:
I show an extreme example to clarify the problem. If you run a long transaction to insert huge data, master continues to send WAL data till the transaction is committed. When you issue your query during the period from 2 to 3,
pg_current_xlog_location()
returns the current location whereasreplay_location
returns the previous replay location,pg_xlog_location_diff(pg_current_xlog_location(), replay_location)
therefore returns more than 0.Other case is the following: The configuration parameter
max_standby_*_delay
is enable and a conflict occurs, your standby suspends replaying WAL data, thereforepg_xlog_location_diff()
returns more than 0.(Both of examples are logical extreme cases.)
In any case, I think your problem does not depend on whether replication mode is sync or async. The difference between sync and async is that the master confirms a synchronous slave commits transactions or not. Your problem is related to general sequence of WAL data sending and replaying, but is not depend on the synchronous mode.
(Consider how a long transaction is committed in synchronous mode. I discussed what happens when your query is issued during the period from 2 to 3 (it's very short period) in the streaming replication sequence I showed. I did not discussed when transaction is committed because it is not essential of the answer, i think.)