PostgreSQL Write Ahead Logs Archive Mode

backuppg-basebackuppostgresql

I'm trying to figure out various things surrounding PostgreSQL and how backups should work together with WAL and Commvault Simpana. Simpana is telling me that everything is ok, but leaves files lying around in the WAL Archive directory.

Let the journey begin.

Environment

PostgreSQL & OS Version

PostgreSQL 9.3 is running on an Ubuntu 14.04.3 LTS server.

Postgres WAL Config

The postgres.conf file is set as follows for WAL archiving.

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
# - Settings -
#wal_level = minimal                    # minimal, archive, or hot_standby
wal_level = archive

[...]

# - Archiving -
archive_mode = on
#archive_mode = off             # allows archiving to be done
                                # (change requires restart)
archive_command = 'cp %p /pgsql-backup/archive/postgres1/%f'
                                # command to use to archive a logfile segment
                                # archive_command = ''           
                                # command to use to archive a logfile segment
                                # placeholders: %p = path of file to archive
                                #               %f = file name only
                                # e.g. 'test ! -f /mnt/server/archivedir/%f    && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0            # force a logfile segment switch after this
                                # number of seconds; 0 disables

If the test ... part is left in the archive_command it breaks the Simpana backup, which is why we omitted it.

The above configuration should result in the WAL files being copied from the /pg_xlog/ directory to the /pgsql-backup/archive/postgres1/ directory, when …

  1. no longer required, because of a pg_basebackup
  2. a WAL file is full (default 16MB) and no longer in use

Commvault Simpana

The client computer has been configured so that the PostgreSQL databases/instance and WAL files in the Archive Log directory are backed up. The WAL files should be deleted when no longer required, because the Simpana option 'Delete Archive' has been set for the PostgreSQL client.

Expected Behaviour

Because Simpana is performing the backup with PostgreSQL native commands, I expect that when Simpana has completed a full backup or a WAL backup, that the files in the /pgsql-backup/archive/postgres1/ directory are deleted.

Effective Behaviour

When I check the /pgsql-backup/archive/postgres1/ directory after Simpana has performed a backup, there will be one more file in the directory with a 0000000300000037000000nn.mmmmmmmm.backup syntax. This is a hint that Simpana is performing a backup using native PostgreSQL commands, because a 0000000300000037000000nn.mmmmmmmm.backup file would only be created after a PostgreSQL backup of the database/instance using pg_basebackup. This is just my conclusion after reading the documentation for PostgreSQL 9.3.

Here is an example of the content of the directory:

[...]
00000003000000370000007A
00000003000000370000007B.00000028.backup
000000030000003700000091.00000028.backup
000000030000003700000093.00000028.backup
000000030000003700000095.00000028.backup
000000030000003700000097.00000028.backup
000000030000003700000099.00000028.backup
00000003000000370000009B.00000028.backup

PostgreSQL Documentation

The official documentation states that

To make use of the backup, you will need to keep all the WAL segment files generated during
and after the file system backup. To aid you in doing this, the base backup process creates a
backup history file that is immediately stored into the WAL archive area. This file is named after
the first WAL segment file that you need for the file system backup. For example, if the starting
WAL file is 0000000100001234000055CD the backup history file will be named something like
0000000100001234000055CD.007C9330.backup. (The second part of the file name stands
for an exact position within the WAL file, and can ordinarily be ignored.) Once you have safely
archived the file system backup and the WAL segment files used during the backup (as specified in
the backup history file), all archived WAL segments with names numerically less are no longer
needed to recover the file system backup and can be deleted. However, you should consider keeping
several backup sets to be absolutely certain that you can recover your data.

This undermines my conclusion that Simpana is using the native PostgreSQL commands to backup the database/instance and its WAL Archive Log files in the directory /pgsql-backup/archive/postgres1/.

According to the documentation a nnnnnnnnnnnnnnnnnnnnnnn.mmmmmmmm.backup file is a pointer to the earliest WAL file required for a roll forward recovery to succeed. Any older WAL files in the Archive Log directory could be deleted and are no longer required.

What amazes me, is that there is a WAL file in the Archive Log directory without a corresponding *.mmmmmmmm.backup pointer file.

Questions

  1. If I weren't using Simpana for the backup, who would (have to) delete the *.mmmmmmmm.backup files in the WAL Archive directory? The pg_archivecleanup command?
  2. Why is there still a full WAL file in the Archive Log directory, when it should have been deleted like all the other WAL files in the Archive Log directory?
  3. Why isn't there a 00000003000000370000007A.mmmmmmmm.backup file for the still existing 00000003000000370000007A WAL file in the archive log directory?

I'm looking forward to your responses and hope that somebody has a similar configuration of Simpana and PostgreSQL somewhere out there.

Best Answer

This seems to be fundamentally a question about Commvault Simpana, not PostgreSQL. As Commvault seems to be commercial software, you might have the best luck contacting their support desk.

Expected Behaviour Because Simpana is performing the backup with PostgreSQL native commands, I expect that when Simpana has completed a full backup or a WAL backup, that the files in the /pgsql-backup/archive/postgres1/ directory are deleted.

I don't know what "a WAL backup" means here. Is that a terminology specific to Simpana? Does it just mean that the WAL files in your original archive directory have been copied to some off-site storage?

Questions If I weren't using Simpana for the backup, who would (have to) delete the *.mmmmmmmm.backup files in the WAL Archive directory? The pg_archivecleanup command?

If you weren't using Simpana, then you would be using something else. We can't tell you what that something else would be--there are lots of choices. While pg_archivecleanup is one such method, it is starting to look pretty obsolete these days. If you only want to keep your WAL files long enough for them to be safely stored (or replayed) on a stand-by, you would now use "streaming replication", and thereby do away with log shipping altogether.

Or you could have a policy to permanently keep the first base backup ever taken (immediately after you initialized your empty database), and every WAL file archived since then, so that you can do point-in-time-recovery to anytime in the history of your database.

Why is there still a full WAL file in the Archive Log directory, when it should have been deleted like all the other WAL files in the Archive Log directory?

It looks to me like when Simpana decides to clean up the archive, instead of removing all WAL files older than the oldest one currently needed, it instead deletes the range of files starting with the one still needed last time it did a clean up, ending at the one just before the one currently needed.

If this is the case, then if a WAL file was archived by PostgreSQL right after you turned on archiving, but before Simpana had been activated (or before it had gotten its bearings) then that file will never be removed.

Why isn't there a 00000003000000370000007A.mmmmmmmm.backup file for the still existing 00000003000000370000007A WAL file in the archive log directory?

If no backup was initiated during the period that 00000003000000370000007A was the active WAL file, then there never would have been a 00000003000000370000007A.mmmmmmmm.backup file in the first place.