Postgresql – How to cleanup Postgres WAL

postgresqlpostgresql-10

What is the correct way to cleanup WAL postgres? I have a database with more than 100 GB and it haves around 600 GB in pg_wal. Also I have 2 logical replications set up.

Master and Slaves – Postgres 10.

Master and Slaves have commented wal_keep_segments and max_wal_size.

pg_archivecleanup did not work with %r option, then I got the archive name by searching in pg_controldata latest checkpoint's REDO WAL file and deleted the logs, but then one replica stopped with the error

could not receive data from WAL stream: FATAL: requested WAL segment xxx has already been removed

To solve this I deleted and recreated the replica.

Best Answer

"pg_wal" cleans itself up. You should almost never touch pg_wal by hand. If it is not cleaning itself up, you need to figure out why and fix the underlying issue.

One possible reason is that you have a replication slot which is holding it back. Either a replica is using a slot and is unable to keep up. Or you have a slot which has no replica attached, for example you destroyed the replica but didn't drop the slop it used to occupy. You can see what slots you have by querying pg_replication_slots, and if necessary drop one with pg_drop_replication_slot, both run on the master. You would look for the slot with the oldest non-NULL value of "restart_lsn".

Another reason is that you have "archive_mode" turned on, but your "archive_command" is constantly failing or can't keep up. You will see warnings about this in your server log file if it is failing.

"pg_archivecleanup" is used to clean up a WAL archive. "pg_wal" is not the archive, it is the live WAL files. You are lucky you didn't destroy your database by monkeying around in there.

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Postgresql – Postgres Incremental Backup (Continuous Archiving WAL)

In order to restore a backup, you need to have the base archive of all the data files, plus a sequence of xlogs. An "incremental backup" can be made, of just some more xlogs in the sequence. Note that if you have any missing xlogs, then recovery will stop early.

So it's not clear here exactly what you've done, unless you changed the level of detail you're mentioning part way through your list. When you make a copy of more segments that have been put into the archive directory after adding more data, you need to ensure that all the data has been archived: using pg_start_backup and pg_stop_backup usually does this for you, but you don't mention it the second time. You need to at least do a pg_switch_xlog to have the current xlog segment immediately archived.

If you think that recovery is not consuming enough xlog segments, look at the recovery log to see if it tried to take them all. And have your recovery command make some sort of mark on which xlog files were taken.

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Postgresql – Postgres Incremental Backup (Continuous Archiving WAL)

Related Question