Postgresql – When does PostgreSQL move a WAL file to archive

postgresqlreplication

I'm a complete newbie with PostgreSQL – please forgive my gaffs – and have inherited a service using a master-slave cluster at the backend.

Currently on the slave, the disk is filling up with WAL files in the archive directory.

On a slave (replica) I presume the archived WAL files been applied to the local instance of the database since there are files in pg_xlog with current filestamps.

Q1: Is this correct?

Q2: What is the criteria by which the system decides that a WAL file is eligible to be moved from pg_xlog to the archive?

The backup/restore model is predicated around filesystem snapshots – so a restore operation would rely on crash recovery. I infer from this that I should only need the artchived WAL files since the last file-system snapshot to be available (although keeping at least 2 generations would be a better idea).

Q3: Is this correct?

Best Answer

Who is writing to the replica's archive directory, the master (via scp or something like that) or the replica?

The replica only archives if archive_mode=always. If it just set to 'on', then only the master archives.

What is the criteria by which the system decides that a WAL file is eligible to be moved from pg_xlog to the archive?

They are never moved. They are copied to the archive. And then at some later point (after at least a checkpoint), they are deleted/overwritten from pg_xlog/pg_wal. The criteria for deletion are all of: a successful checkpoint/restartpoint, wal_keep_segments satisfied, all replication slots (if any) satisfied, archive_command (if in use) returns success.

The backup/restore model is predicated around filesystem snapshots - so a restore operation would rely on crash recovery.

Who are you trying to restore here, the master or the replica? I just recreate a replica if it gets lost, but some people find that too slow and want to be able to restore it. And some people wouldn't recovery the master at all, they would just promote the replica.

I infer from this that I should only need the artchived WAL files since the last file-system snapshot to be available (although keeping at least 2 generations would be a better idea).

Unless your file-system snapshot has itself been backed up somewhere, I would assume it is no longer available to be used after a major crash. So you need all the WAL since the beginning of the most recent snapshot which has itself been backed up offsite. If you want to retain the ability to do PITR, you need more than that.