In order to restore a backup, you need to have the base archive of all the data files, plus a sequence of xlogs. An "incremental backup" can be made, of just some more xlogs in the sequence. Note that if you have any missing xlogs, then recovery will stop early.
So it's not clear here exactly what you've done, unless you changed the level of detail you're mentioning part way through your list. When you make a copy of more segments that have been put into the archive directory after adding more data, you need to ensure that all the data has been archived: using pg_start_backup
and pg_stop_backup
usually does this for you, but you don't mention it the second time. You need to at least do a pg_switch_xlog
to have the current xlog segment immediately archived.
If you think that recovery is not consuming enough xlog segments, look at the recovery log to see if it tried to take them all. And have your recovery command make some sort of mark on which xlog files were taken.
Update: I've posted about this to the AWS forums - please go chime in and ask for it there.
At time of writing, Amazon RDS does not support physical replication outside RDS. You can GRANT
users the REPLICATION
right using an rds_superuser
login, but you can't configure replication
entries for outside IPs in pg_hba.conf
.
Furthermore, when you create a DB parameter group in RDS, some key parameters are shown but locked, e.g. archive_command
, which is locked to /etc/rds/dbbin/pgscripts/rds_wal_archive %p
. AWS RDS for PostgreSQL does not appear to expose these WALs for external access (say, via S3) as it would need to if you were to use WAL-shipping replication for external PITR.
So at this point, if you want wal-shipping, don't use RDS. It's a canned easy-to-use database, but easy-to-use often means that it's also limited, and that's certainly the case here. As Joe Love points out in the comments, it provides WAL shipping and PITR within RDS, but you can't get access to the WAL to that from outside RDS.
So you need to use RDS's own backup facilities - dumps, snapshots and its own WAL-based PITR.
Even if RDS did let you make replication connections (for pg_basebackup
or streaming replication) and allowed you to access archived WAL, you might not be able to actually consume that WAL. RDS runs a patched PostgreSQL, though nobody knows how heavily patched or whether it significantly alters the on-disk format. It also runs on the architecture selected by Amazon, which is probably x64 Linux, but not easily determined. Since PostgreSQL's on disk format and replication are architecture dependent, you could only replicate to hosts with the same architecture as that used by Amazon RDS, and only if your PostgreSQL build was compatible with theirs.
Among other things this means that you don't have any easy way to migrate away from RDS. You'd have to stop all writes to the database for long enough to take a pg_dump
, restore it, and get the new DB running. The usual tricks with replication and failover, with rsync, etc, won't work because you don't have direct access to the DB host.
Even if RDS ran an unpatched PostgreSQL Amazon probably wouldn't want to permit you to do WAL streaming into RDS or import into RDS using pg_basebackup
for security reasons. PostgreSQL treats the data directory as trusted content, and if you've crafted any clever 'LANGUAGE c' functions that hook internal functionality or done anything else tricky you might be able to exploit the server to get greater access than you're supposed to have. So Amazon aren't going to permit inbound WAL anytime soon.
They could support outbound WAL sending, but the above issues with format compatibility, freedom to make changes, etc still apply.
Instead you should use a tool like Londiste or Bucardo.
Best Answer
If you only have the WAL, with no base backup (copy of the data directory,
pg_basebackup
, etc), you cannot restore. Full stop. And no, you cannot use a dump frompg_dump
to restore WAL on top of.WAL only contains changes to the data directory, and is meaningless without a base backup to apply it to.
Imagine you have a half page from one of your bank statements, without a running balance shown in a column. You want to use it to find out the balance on your account. You cannot possibly do that, since you don't know the starting or ending balance. Same issue here.