PostgreSQL does not care when the WAL logs are generated, it just reads them sequentially on restore/recovery.
If the WAL logs were properly sent after the restart of the database, they should be restored as well.
However, if there is even a single WAL file missing for some reason (in this example, I'd guess that'd be either the last segment before the shutdown or one or more of the first ones from after the startup), which is typically either an incorrectly configured archive_command (missing error checks for example) or simply a broken archive server, the restore process cannot go past that point.
Update: I've posted about this to the AWS forums - please go chime in and ask for it there.
At time of writing, Amazon RDS does not support physical replication outside RDS. You can GRANT
users the REPLICATION
right using an rds_superuser
login, but you can't configure replication
entries for outside IPs in pg_hba.conf
.
Furthermore, when you create a DB parameter group in RDS, some key parameters are shown but locked, e.g. archive_command
, which is locked to /etc/rds/dbbin/pgscripts/rds_wal_archive %p
. AWS RDS for PostgreSQL does not appear to expose these WALs for external access (say, via S3) as it would need to if you were to use WAL-shipping replication for external PITR.
So at this point, if you want wal-shipping, don't use RDS. It's a canned easy-to-use database, but easy-to-use often means that it's also limited, and that's certainly the case here. As Joe Love points out in the comments, it provides WAL shipping and PITR within RDS, but you can't get access to the WAL to that from outside RDS.
So you need to use RDS's own backup facilities - dumps, snapshots and its own WAL-based PITR.
Even if RDS did let you make replication connections (for pg_basebackup
or streaming replication) and allowed you to access archived WAL, you might not be able to actually consume that WAL. RDS runs a patched PostgreSQL, though nobody knows how heavily patched or whether it significantly alters the on-disk format. It also runs on the architecture selected by Amazon, which is probably x64 Linux, but not easily determined. Since PostgreSQL's on disk format and replication are architecture dependent, you could only replicate to hosts with the same architecture as that used by Amazon RDS, and only if your PostgreSQL build was compatible with theirs.
Among other things this means that you don't have any easy way to migrate away from RDS. You'd have to stop all writes to the database for long enough to take a pg_dump
, restore it, and get the new DB running. The usual tricks with replication and failover, with rsync, etc, won't work because you don't have direct access to the DB host.
Even if RDS ran an unpatched PostgreSQL Amazon probably wouldn't want to permit you to do WAL streaming into RDS or import into RDS using pg_basebackup
for security reasons. PostgreSQL treats the data directory as trusted content, and if you've crafted any clever 'LANGUAGE c' functions that hook internal functionality or done anything else tricky you might be able to exploit the server to get greater access than you're supposed to have. So Amazon aren't going to permit inbound WAL anytime soon.
They could support outbound WAL sending, but the above issues with format compatibility, freedom to make changes, etc still apply.
Instead you should use a tool like Londiste or Bucardo.
Best Answer
You can, but the more WALs that have accumulated, the longer it will take to restore. You should periodically do a test restoration. If it takes "too long" (whatever that means for your organization) then you should take a new base backup which will need to replay less WAL.
What I usually do is keep the first base backup ever taken (for a given major version) so that I can PITR to any point. Then keep annual base backups, so that PITR can take less time if starting from one of them. And take monthly base backups so that crash-recovery restoration would take less time. Once a new monthly base backup has been tested, then the second-previous one is deleted, unless it happens to also be the annual one. If I do some maintenance operation which generates a large amount of WAL, I might take an out-of-cycle backup because otherwise restoration from the last monthly might take too long to replay. Of course this is only an example, you have to suit yourself.
At some point in the distant future, it might be hard to find suitable hardware unto which to restore a base backup, (with virtual machines that may be less of a concern, but it just pushes it down a level. For how long will it be easy to find VM software that still supports 32-bit guests?). A logical backup could help in that case. That is about the only reason I can think of to take routine "pg_dump" if you are also doing base backups and archiving. I have wanted historical logical dumps occasionally, so I just did PITR onto a dummy server, then ran pg_dump against that dummy server.