I set up barman using wal streaming, I just notice I have a hug replay_lag. I'd like to keep it down to 0 but have no idea about how to do this.
I also have a database replica and it is working fine.
I have barman 2.7 and postgresl 10.8
When I check the postgresql replication status I found this:
select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 23095
usesysid | 169593
usename | repmgr
application_name | replication_server
client_addr | xxx.16.2.66
client_hostname | replica_server
client_port | 51164
backend_start | 2019-07-05 23:03:03.194165-05
backend_xmin |
state | streaming
sent_lsn | 22/30884870
write_lsn | 22/30884870
flush_lsn | 22/30884870
replay_lsn | 22/30884870
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
-[ RECORD 2 ]----+------------------------------
pid | 6689
usesysid | 66019
usename | streaming_barman
application_name | barman_receive_wal
client_addr | xxx.172.16.109
client_hostname | barman_server
client_port | 40680
backend_start | 2019-07-16 00:00:06.903489-05
backend_xmin |
state | streaming
sent_lsn | 22/30884870
write_lsn | 22/30884870
flush_lsn | 22/30000000
replay_lsn |
write_lag | 00:00:03.01204
flush_lag | 00:00:01.085313
replay_lag | 583:31:21.110173
sync_priority | 0
sync_state | async
select * from pg_replication_slots ;
-[ RECORD 1 ]-------+----------------
slot_name | barman_slot
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 6689
xmin |
catalog_xmin |
restart_lsn | 22/30000000
confirmed_flush_lsn |
barman status master_server
Server master_server:
Description: master_server - streaming
Active: True
Disabled: False
PostgreSQL version: 10.8
Cluster state: in production
pgespresso extension: Not available
Current data size: 11.4 GiB
PostgreSQL Data directory: /var/lib/postgresql/10/main
Current WAL segment: 000000010000002200000030
PostgreSQL 'archive_command' setting: barman-wal-archive barman_server master_server %p
Last archived WAL: 00000001000000220000002F, at Fri Aug 9 06:56:05 2019
Failures of WAL archiver: 63 (000000010000001C00000062 at Mon Jul 15 23:59:21 2019)
Server WAL archiving rate: 2.60/hour
Passive node: False
Retention policies: enforced (mode: auto, retention: RECOVERY WINDOW OF 1 MONTHS, WAL retention: MAIN)
No. of available backups: 1
First available backup: 20190809T063454
Last available backup: 20190809T063454
Minimum redundancy requirements: satisfied (1/1)
Can anyone point me to a resource I can check to figure out how to fix this Failures of WAL archiver and replay_lag?
Thanks in advance
Best Answer
This large replay lag in
pg_stat_replication
seems to belong to apg_receivewal
process. Nowpg_receivewal
writes a copy of the WAL files, but it does not apply them anywhere. Consequently, it will not report back to the primary server that WAL was applied.This is perfectly normal; compare commit fd7d387e05.
The “Failures of WAL archiver” are also nothing you have to worry about. This number probably comes from
pg_stat_archiver
and indicates that there has been a problem archiving WALs in the past. This problem must have been resolved, because “Last archived WAL” indicates that more recent WAL files have been archived successfully.PostgreSQL won't skip archiving WALs — if archiving fails, the archiver gets stuck at that place and retries until successful.