Postgresql – WAL Archiving and pg_basebackup

backuppg-basebackuppostgresqlreplication

For backing up our postgres database, we're considering 2 separate options:

1- Logical backup with pg_dump
2- pg_basebackup with consistent WAL Archiving.

So I have two questions:

1- How often should we make a base backup?
Can we just create it once and then just archive the WAL files afterwards?

2- Do we need both approaches or one would suffice?

Best Answer

How often should we make a base backup? Can we just create it once and then just archive the WAL files afterwards?

You can, but the more WALs that have accumulated, the longer it will take to restore. You should periodically do a test restoration. If it takes "too long" (whatever that means for your organization) then you should take a new base backup which will need to replay less WAL.

What I usually do is keep the first base backup ever taken (for a given major version) so that I can PITR to any point. Then keep annual base backups, so that PITR can take less time if starting from one of them. And take monthly base backups so that crash-recovery restoration would take less time. Once a new monthly base backup has been tested, then the second-previous one is deleted, unless it happens to also be the annual one. If I do some maintenance operation which generates a large amount of WAL, I might take an out-of-cycle backup because otherwise restoration from the last monthly might take too long to replay. Of course this is only an example, you have to suit yourself.

Do we need both approaches or one would suffice?

At some point in the distant future, it might be hard to find suitable hardware unto which to restore a base backup, (with virtual machines that may be less of a concern, but it just pushes it down a level. For how long will it be easy to find VM software that still supports 32-bit guests?). A logical backup could help in that case. That is about the only reason I can think of to take routine "pg_dump" if you are also doing base backups and archiving. I have wanted historical logical dumps occasionally, so I just did PITR onto a dummy server, then ran pg_dump against that dummy server.