Postgresql – Streaming replication archive folder taking all space- Postgresql- 9.4

postgresqlpostgresql-9.4replicationwrite-ahead-logging

We had recently setup postgres streaming replication between master (physical, CentOS release 6.6) and slave(virtual, CentOS 7.6). We did the test and streaming replication is working fine.

We created few databases in master and it is getting replicated to slave servers.
Our Postgresql version is 9.4 and data directory is "/var/lib/pgsql/9.4/data" in both master and slave.

While setting up streaming replication, we edit postgresql.conf in master

wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/9.4/archive/%f'

Our recovery.conf in slave is

standby_mode = 'on'
primary_conninfo = 'host=192.168.0.10 port=5432 user=replication password=password'
trigger_file = '/var/lib/pgsql/9.4/trigger'
restore_command = 'cp /var/lib/pgsql/9.4/archive/%f "%p"'

I also added below command to clear the old archive in recovery.conf in slave

archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/9.4/archive %r'

but I started seeing below error in pg_log of slave.

< 2019-01-16 09:02:48.937 EST >WARNING:  archive_cleanup_command "pg_archivecleanup /var/lib/pgsql/9.4/archive %r": child process exited with exit code 2
pg_archivecleanup: archive location "/var/lib/pgsql/9.4/archive" does not exist
< 2019-01-16 09:07:45.927 EST >WARNING:  archive_cleanup_command "pg_archivecleanup /var/lib/pgsql/9.4/archive %r": child process exited with exit code 2

After doing some research I realized that problem is genuine as the slave is not having this directory "/var/lib/pgsql/9.4/archive" and its present in master server only.

Now, we removed "archive_cleanup_command" from recovery.conf (slave) but in our master server "/var/lib/pgsql/9.4/archive" is taking so much of space. We need to clear these old files inside "archive".

I had few doubts and wanted to take some suggestions to resolve the problem :-

  1. Can you suggest what can be done so that we don't run into every day "No space left in /var of the master" issue ?
    Can we run the pg_archivecleanup from master server and remove the old files like below
pg_archivecleanup -d archive 000000010000003700000010.00000020.backup
pg_archivecleanup:  keep WAL file "archive/000000010000003700000010" and later
pg_archivecleanup:  removing file "archive/00000001000000370000000F"
pg_archivecleanup:  removing file "archive/00000001000000370000000E"
  1. Should I setup streaming replication again with some mount point attached to both master and slave and save the archive of master to "mount point" and then add "archive_cleanup_command" in recovery.conf?

Best Answer

Does the archive exist only to support the streaming replication, or do you want it for other reasons like PITR? (Since you attempted to configure pg_archivecleanup in the first place, I guess I can assume you don't want it for other reasons...)

Because your restore_command is currently broken, then you know that the WAL archive has not been needed thus far to support streaming replication--because if it had been needed it would not have worked. That is, your replica has never fallen far enough behind that the master has removed the WAL from pg_wal before the replica had fetched it. If you think that this will always continue to be the case, then you just turn achive_mode = off and delete your current archive. There is no point in maintaining an archive that is not and cannot be used.

Since the introduction of replication slots, pg_archivecleanup is more-or-less obsolete. If you want a backstop in case the replica doesn't naturally keep up, you can make a shared mount point and use pg_archivecleanup on that mount point if you want. But configuring a replication slot is a better way to accomplish the same thing.