PostgreSQL – Slave Keeps xlog Files with Replication Slot

master-slave-replicationpostgresqlreplication

I setup one master and two PostgreSQL servers (v9.6). In the master (and slaves) I set max_replication_slots = 2 and archive_mode is commented out everywhere. The replication works fine: There are no problems and everything is up to date.

On the master the xlog directory is really small. So it's deleting the files which is fine and intended. But on both slave servers the files in the xlog directory are not getting deleted. Do I have to configure something special?

This is the postgresql.conf without the comments:

data_directory = '/data/postgres-data-9.6'
hba_file = '/etc/postgresql/9.6/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.6/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.6-main.pid'
listen_addresses = '*'
max_connections = 600
unix_socket_directories = '/var/run/postgresql'
ssl = on
ssl_cert_file = '/etc/postgresql/9.6/main/ssl/server.crt'
ssl_key_file = '/etc/postgresql/9.6/main/ssl/server.key'
shared_buffers = 6GB
work_mem = 50MB
maintenance_work_mem = 1280MB
dynamic_shared_memory_type = posix
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
wal_level = hot_standby
wal_buffers = 1MB
max_wal_size = 1024MB
checkpoint_completion_target = 0.9
max_wal_senders = 10
max_replication_slots = 2
hot_standby = on
effective_cache_size = 15GB
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_duration_statement = 200
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
log_timezone = 'UTC'
track_counts = on
track_activity_query_size = 2048
autovacuum = on
log_autovacuum_min_duration = 0
timezone = 'UTC'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'

The recovery.conf:

standby_mode = 'on'
primary_conninfo = 'user=replicator password=*** 
host=*** port=5432 sslmode=prefer sslcompression=1 
krbsrvname=postgres'
primary_slot_name = '***'
trigger_file = '/data/postgres-data-9.6/postgresql.trigger.5432'

Do you know if I forgot something? The goal is to not keep the old files in the xlog directory.

//edit:

After adding the configuration line "log_checkpoints" and restarting:

2018-07-20 09:05:33 UTC [30903]: [1-1] user=,db= LOG:  database system was shut down in recovery at 2018-07-20 09:05:30 UTC
2018-07-20 09:05:33 UTC [30903]: [2-1] user=,db= LOG:  entering standby mode
2018-07-20 09:05:33 UTC [30904]: [1-1] user=[unknown],db=[unknown] LOG:  incomplete startup packet
2018-07-20 09:05:33 UTC [30903]: [3-1] user=,db= LOG:  redo starts at 694/3D052D88
2018-07-20 09:05:34 UTC [30903]: [4-1] user=,db= LOG:  consistent recovery state reached at 694/4682CBC8
2018-07-20 09:05:34 UTC [30903]: [5-1] user=,db= LOG:  invalid record length at 694/4682CBC8: wanted 24, got 0
2018-07-20 09:05:34 UTC [30901]: [3-1] user=,db= LOG:  database system is ready to accept read only connections
2018-07-20 09:05:34 UTC [30917]: [1-1] user=,db= LOG:  started streaming WAL from primary at 694/46000000 on timeline 1
2018-07-20 09:10:33 UTC [30914]: [1-1] user=,db= LOG:  restartpoint starting: time
2018-07-20 09:15:05 UTC [30914]: [2-1] user=,db= LOG:  restartpoint complete: wrote 31615 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.402 s, sync=2.009 s, total=272.076 s; sync files=788, longest=0.167 s, average=0.002 s; distance=113433 kB, estimate=113433 kB
2018-07-20 09:15:05 UTC [30914]: [3-1] user=,db= LOG:  recovery restart point at 694/43F19248
2018-07-20 09:15:05 UTC [30914]: [4-1] user=,db= DETAIL:  last completed transaction was at log time 2018-07-20 09:15:05.359356+00
2018-07-20 09:15:33 UTC [30914]: [5-1] user=,db= LOG:  restartpoint starting: time
2018-07-20 09:20:06 UTC [30914]: [6-1] user=,db= LOG:  restartpoint complete: wrote 11527 buffers (1.5%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=268.760 s, sync=2.820 s, total=272.720 s; sync files=729, longest=0.204 s, average=0.003 s; distance=90020 kB, estimate=111091 kB
2018-07-20 09:20:06 UTC [30914]: [7-1] user=,db= LOG:  recovery restart point at 694/49702568
2018-07-20 09:20:06 UTC [30914]: [8-1] user=,db= DETAIL:  last completed transaction was at log time 2018-07-20 09:20:06.242342+00

//edit: Only the "restartpoint complete" lines the last 3 hours:

2018-07-20 09:15:05 UTC [30914]: [2-1] user=,db= LOG:  restartpoint complete: wrote 31615 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.402 s, sync=2.009 s, total=272.076 s; sync files=788, longest=0.167 s, average=0.002 s; distance=113433 kB, estimate=113433 kB
2018-07-20 09:20:06 UTC [30914]: [6-1] user=,db= LOG:  restartpoint complete: wrote 11527 buffers (1.5%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=268.760 s, sync=2.820 s, total=272.720 s; sync files=729, longest=0.204 s, average=0.003 s; distance=90020 kB, estimate=111091 kB
2018-07-20 09:25:06 UTC [30914]: [10-1] user=,db= LOG:  restartpoint complete: wrote 20321 buffers (2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.531 s, sync=2.279 s, total=272.606 s; sync files=704, longest=0.214 s, average=0.003 s; distance=114923 kB, estimate=114923 kB
2018-07-20 09:30:04 UTC [30914]: [14-1] user=,db= LOG:  restartpoint complete: wrote 11241 buffers (1.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.368 s, sync=1.003 s, total=271.055 s; sync files=778, longest=0.184 s, average=0.001 s; distance=79632 kB, estimate=111394 kB
2018-07-20 09:35:05 UTC [30914]: [18-1] user=,db= LOG:  restartpoint complete: wrote 13326 buffers (1.7%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.852 s, sync=1.806 s, total=272.220 s; sync files=708, longest=0.185 s, average=0.002 s; distance=104062 kB, estimate=110661 kB
2018-07-20 09:40:05 UTC [30914]: [22-1] user=,db= LOG:  restartpoint complete: wrote 15974 buffers (2.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.187 s, sync=2.468 s, total=272.209 s; sync files=650, longest=0.212 s, average=0.003 s; distance=87183 kB, estimate=108313 kB
2018-07-20 09:45:04 UTC [30914]: [26-1] user=,db= LOG:  restartpoint complete: wrote 11135 buffers (1.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=268.849 s, sync=0.768 s, total=270.347 s; sync files=723, longest=0.137 s, average=0.001 s; distance=89407 kB, estimate=106423 kB
2018-07-20 09:50:04 UTC [30914]: [30-1] user=,db= LOG:  restartpoint complete: wrote 8950 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.516 s, sync=1.556 s, total=271.776 s; sync files=698, longest=0.140 s, average=0.002 s; distance=97044 kB, estimate=105485 kB
2018-07-20 09:55:04 UTC [30914]: [34-1] user=,db= LOG:  restartpoint complete: wrote 12561 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=268.798 s, sync=1.403 s, total=270.810 s; sync files=735, longest=0.164 s, average=0.001 s; distance=68582 kB, estimate=101794 kB
2018-07-20 10:00:04 UTC [30914]: [38-1] user=,db= LOG:  restartpoint complete: wrote 11192 buffers (1.4%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=268.888 s, sync=1.082 s, total=271.150 s; sync files=663, longest=0.123 s, average=0.001 s; distance=68565 kB, estimate=98471 kB
2018-07-20 10:05:04 UTC [30914]: [42-1] user=,db= LOG:  restartpoint complete: wrote 7842 buffers (1.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=268.907 s, sync=0.495 s, total=270.101 s; sync files=636, longest=0.130 s, average=0.000 s; distance=66920 kB, estimate=95316 kB
2018-07-20 10:10:04 UTC [30914]: [46-1] user=,db= LOG:  restartpoint complete: wrote 7200 buffers (0.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.618 s, sync=0.491 s, total=271.083 s; sync files=660, longest=0.200 s, average=0.000 s; distance=75752 kB, estimate=93360 kB
2018-07-20 10:15:04 UTC [30914]: [50-1] user=,db= LOG:  restartpoint complete: wrote 5525 buffers (0.7%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.352 s, sync=1.028 s, total=271.297 s; sync files=634, longest=0.170 s, average=0.001 s; distance=55101 kB, estimate=89534 kB
2018-07-20 10:20:04 UTC [30914]: [54-1] user=,db= LOG:  restartpoint complete: wrote 4903 buffers (0.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.189 s, sync=1.073 s, total=270.779 s; sync files=673, longest=0.216 s, average=0.001 s; distance=36054 kB, estimate=84186 kB
2018-07-20 10:25:04 UTC [30914]: [58-1] user=,db= LOG:  restartpoint complete: wrote 5555 buffers (0.7%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=269.495 s, sync=0.643 s, total=270.927 s; sync files=659, longest=0.160 s, average=0.000 s; distance=36557 kB, estimate=79423 kB
2018-07-20 10:30:05 UTC [30914]: [62-1] user=,db= LOG:  restartpoint complete: wrote 6671 buffers (0.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=269.471 s, sync=1.082 s, total=271.833 s; sync files=680, longest=0.138 s, average=0.001 s; distance=32391 kB, estimate=74720 kB
2018-07-20 10:35:04 UTC [30914]: [66-1] user=,db= LOG:  restartpoint complete: wrote 11940 buffers (1.5%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.910 s, sync=0.594 s, total=271.064 s; sync files=632, longest=0.141 s, average=0.000 s; distance=43135 kB, estimate=71561 kB
2018-07-20 10:40:04 UTC [30914]: [70-1] user=,db= LOG:  restartpoint complete: wrote 4944 buffers (0.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.377 s, sync=1.091 s, total=271.065 s; sync files=599, longest=0.148 s, average=0.001 s; distance=34289 kB, estimate=67834 kB
2018-07-20 10:45:04 UTC [30914]: [74-1] user=,db= LOG:  restartpoint complete: wrote 6011 buffers (0.8%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.588 s, sync=1.378 s, total=271.442 s; sync files=589, longest=0.173 s, average=0.002 s; distance=64235 kB, estimate=67474 kB
2018-07-20 10:50:04 UTC [30914]: [78-1] user=,db= LOG:  restartpoint complete: wrote 4415 buffers (0.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.144 s, sync=1.205 s, total=270.871 s; sync files=553, longest=0.137 s, average=0.002 s; distance=41258 kB, estimate=64853 kB
2018-07-20 10:55:05 UTC [30914]: [82-1] user=,db= LOG:  restartpoint complete: wrote 7277 buffers (0.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.271 s, sync=1.823 s, total=271.671 s; sync files=684, longest=0.183 s, average=0.002 s; distance=31072 kB, estimate=61475 kB
2018-07-20 11:00:03 UTC [30914]: [86-1] user=,db= LOG:  restartpoint complete: wrote 9467 buffers (1.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.347 s, sync=0.461 s, total=270.416 s; sync files=652, longest=0.134 s, average=0.000 s; distance=29704 kB, estimate=58298 kB
2018-07-20 11:05:04 UTC [30914]: [90-1] user=,db= LOG:  restartpoint complete: wrote 5571 buffers (0.7%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.042 s, sync=1.026 s, total=270.529 s; sync files=573, longest=0.147 s, average=0.001 s; distance=53303 kB, estimate=57798 kB
2018-07-20 11:10:04 UTC [30914]: [94-1] user=,db= LOG:  restartpoint complete: wrote 12892 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.290 s, sync=0.825 s, total=270.740 s; sync files=580, longest=0.190 s, average=0.001 s; distance=66788 kB, estimate=66788 kB
2018-07-20 11:15:04 UTC [30914]: [98-1] user=,db= LOG:  restartpoint complete: wrote 7541 buffers (1.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.596 s, sync=0.990 s, total=271.140 s; sync files=620, longest=0.132 s, average=0.001 s; distance=68238 kB, estimate=68238 kB
2018-07-20 11:20:05 UTC [30914]: [102-1] user=,db= LOG:  restartpoint complete: wrote 8326 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.309 s, sync=2.045 s, total=271.879 s; sync files=611, longest=0.143 s, average=0.003 s; distance=55564 kB, estimate=66970 kB
2018-07-20 11:25:04 UTC [30914]: [106-1] user=,db= LOG:  restartpoint complete: wrote 9998 buffers (1.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.623 s, sync=1.390 s, total=271.495 s; sync files=624, longest=0.152 s, average=0.002 s; distance=70330 kB, estimate=70330 kB
2018-07-20 11:30:04 UTC [30914]: [110-1] user=,db= LOG:  restartpoint complete: wrote 9649 buffers (1.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=268.755 s, sync=1.367 s, total=270.817 s; sync files=630, longest=0.141 s, average=0.002 s; distance=43960 kB, estimate=67693 kB
2018-07-20 11:35:05 UTC [30914]: [114-1] user=,db= LOG:  restartpoint complete: wrote 11041 buffers (1.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.630 s, sync=2.046 s, total=272.287 s; sync files=592, longest=0.160 s, average=0.003 s; distance=65727 kB, estimate=67497 kB
2018-07-20 11:40:04 UTC [30914]: [118-1] user=,db= LOG:  restartpoint complete: wrote 10156 buffers (1.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=268.753 s, sync=1.492 s, total=270.788 s; sync files=686, longest=0.153 s, average=0.002 s; distance=90728 kB, estimate=90728 kB
2018-07-20 11:45:05 UTC [30914]: [122-1] user=,db= LOG:  restartpoint complete: wrote 8217 buffers (1.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.309 s, sync=1.210 s, total=271.452 s; sync files=659, longest=0.144 s, average=0.001 s; distance=61239 kB, estimate=87779 kB
2018-07-20 11:50:05 UTC [30914]: [126-1] user=,db= LOG:  restartpoint complete: wrote 10599 buffers (1.3%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=269.456 s, sync=1.871 s, total=272.040 s; sync files=681, longest=0.198 s, average=0.002 s; distance=66392 kB, estimate=85640 kB
2018-07-20 11:55:05 UTC [30914]: [130-1] user=,db= LOG:  restartpoint complete: wrote 11844 buffers (1.5%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=269.670 s, sync=1.649 s, total=272.103 s; sync files=541, longest=0.162 s, average=0.003 s; distance=59357 kB, estimate=83012 kB
2018-07-20 12:00:04 UTC [30914]: [134-1] user=,db= LOG:  restartpoint complete: wrote 11374 buffers (1.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.221 s, sync=1.348 s, total=271.224 s; sync files=684, longest=0.188 s, average=0.001 s; distance=79870 kB, estimate=82698 kB
2018-07-20 12:05:05 UTC [30914]: [138-1] user=,db= LOG:  restartpoint complete: wrote 14003 buffers (1.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=269.311 s, sync=1.766 s, total=271.816 s; sync files=646, longest=0.148 s, average=0.002 s; distance=74689 kB, estimate=81897 kB
2018-07-20 12:10:05 UTC [30914]: [142-1] user=,db= LOG:  restartpoint complete: wrote 12900 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=270.474 s, sync=0.592 s, total=271.820 s; sync files=662, longest=0.175 s, average=0.000 s; distance=56910 kB, estimate=79398 kB

Best Answer

I think you didn't follow the advice from the docs:

It is often a good idea to also omit from the backup the files within the cluster's pg_replslot/ directory, so that replication slots that exist on the master do not become part of the backup. Otherwise, the subsequent use of the backup to create a standby may result in indefinite retention of WAL files on the standby, and possibly bloat on the master if hot standby feedback is enabled, because the clients that are using those replication slots will still be connecting to and updating the slots on the master, not the standby.

If you shut down the replicas, remove those files, and start them back up again, it should fix the problem.