Most likely what you're seeing is a huge checkpoint_segments
value and long checkpoint_timeout
; alternately, they might have set wal_keep_segments
to a very large value if it's supposed to support streaming replication.
You can force a checkpoint with the CHECKPOINT
command. This may stall the database for some time if it has accumulated a huge amount of WAL and hasn't been background-writing it. If checkpoint_completion_target
is low (less than 0.8 or 0.9) then there's likely to be a big backlog of work to do at checkpoint time. Be prepared for the database to become slow and unresponsive during the checkpoint. You cannot abort a checkpoint once it begins by normal means; you can crash the database and restart it, but that just puts you back to where you were.
I'm not certain, but I have the feeling a checkpoint could also result in growth of the main database - and do so before any space is freed in the WAL, if it is at all. So a checkpoint could potentially trigger you running out of space, something that's very hard to recover from without adding more storage at least temporarily.
Now would be a very good time to get a proper backup of the database - use pg_dump -Fc dbname
to dump each database, and pg_dumpall --globals-only
to dump user definitions etc.
If you can afford the downtime, stop the database and take a file-system level copy of the entire data directory (the folder containing pg_xlog
, pg_clog
, global
, base
, etc). Do not do this while the server is running and do not omit any files or folders, they are all important (well, except pg_log
, but it's a good idea to keep the text logs anyway).
If you'd like more specific comment on the likely cause (and so I can be more confident in my hypothesis is) you can run the following queries and paste their output into your answer (in a code-indented block) then comment so I'm notified:
SELECT version();
SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');
It is possible that setting checkpoint_completion_target = 1
then stopping and restarting the DB might cause it to start aggressively writing out queued up WAL. It won't free any until it does a checkpoint, but you could force one once write activity slows down (as measured with sar, iostat, etc). I have not tested to see if checkpoint_completion_target
affects already-written WAL when changed in a restart; consider testing this on a throwaway test PostgreSQL you initdb
on another machine first.
Backups have nothing to do with WAL retention and growth; it isn't backup related.
See:
Best Answer
Firstly PG 8.4 is no longer supported (since July 2014), you should move your databases to a supported version as soon as possible.
The WAL configuration settings for the 8.4 version are documented here:
https://www.postgresql.org/docs/8.4/static/runtime-config-wal.html
In particular
checkpoint_segments
control the number of WAL files.You must not delete WAL files, either manually or with a cron job or with
archive_command
. This would definitely corrupt your database.WAL files may grow in number if
archive_command
is not working or lagging behind.