Postgresql – Streaming replication archive folder taking all space- Postgresql- 9.4

postgresqlpostgresql-9.4replicationwrite-ahead-logging

We had recently setup postgres streaming replication between master (physical, CentOS release 6.6) and slave(virtual, CentOS 7.6). We did the test and streaming replication is working fine.

We created few databases in master and it is getting replicated to slave servers.
Our Postgresql version is 9.4 and data directory is "/var/lib/pgsql/9.4/data" in both master and slave.

While setting up streaming replication, we edit postgresql.conf in master

wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/9.4/archive/%f'

Our recovery.conf in slave is

standby_mode = 'on'
primary_conninfo = 'host=192.168.0.10 port=5432 user=replication password=password'
trigger_file = '/var/lib/pgsql/9.4/trigger'
restore_command = 'cp /var/lib/pgsql/9.4/archive/%f "%p"'

I also added below command to clear the old archive in recovery.conf in slave

archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/9.4/archive %r'

but I started seeing below error in pg_log of slave.

< 2019-01-16 09:02:48.937 EST >WARNING:  archive_cleanup_command "pg_archivecleanup /var/lib/pgsql/9.4/archive %r": child process exited with exit code 2
pg_archivecleanup: archive location "/var/lib/pgsql/9.4/archive" does not exist
< 2019-01-16 09:07:45.927 EST >WARNING:  archive_cleanup_command "pg_archivecleanup /var/lib/pgsql/9.4/archive %r": child process exited with exit code 2

After doing some research I realized that problem is genuine as the slave is not having this directory "/var/lib/pgsql/9.4/archive" and its present in master server only.

Now, we removed "archive_cleanup_command" from recovery.conf (slave) but in our master server "/var/lib/pgsql/9.4/archive" is taking so much of space. We need to clear these old files inside "archive".

I had few doubts and wanted to take some suggestions to resolve the problem :-

Can you suggest what can be done so that we don't run into every day "No space left in /var of the master" issue ?
Can we run the pg_archivecleanup from master server and remove the old files like below

pg_archivecleanup -d archive 000000010000003700000010.00000020.backup
pg_archivecleanup:  keep WAL file "archive/000000010000003700000010" and later
pg_archivecleanup:  removing file "archive/00000001000000370000000F"
pg_archivecleanup:  removing file "archive/00000001000000370000000E"

Should I setup streaming replication again with some mount point attached to both master and slave and save the archive of master to "mount point" and then add "archive_cleanup_command" in recovery.conf?

Best Answer

Does the archive exist only to support the streaming replication, or do you want it for other reasons like PITR? (Since you attempted to configure pg_archivecleanup in the first place, I guess I can assume you don't want it for other reasons...)

Because your restore_command is currently broken, then you know that the WAL archive has not been needed thus far to support streaming replication--because if it had been needed it would not have worked. That is, your replica has never fallen far enough behind that the master has removed the WAL from pg_wal before the replica had fetched it. If you think that this will always continue to be the case, then you just turn achive_mode = off and delete your current archive. There is no point in maintaining an archive that is not and cannot be used.

Since the introduction of replication slots, pg_archivecleanup is more-or-less obsolete. If you want a backstop in case the replica doesn't naturally keep up, you can make a shared mount point and use pg_archivecleanup on that mount point if you want. But configuring a replication slot is a better way to accomplish the same thing.

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Postgresql – Streaming Replication in PostgreSQL

PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.

So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Postgresql – Streaming Replication in PostgreSQL

Related Question