PostgreSQL – How to Create Backup with Dump and WAL Files

backuplogspostgresqlrestore

I have a backup and WAL files, I like update my backup with WAL files. Anyone know how to do it?

Thanks

Best Answer

You cannot restore a pg_dump (via psql or pg_restore) then update it with WAL archives. It just isn't possible, sorry.

WAL archives are block-level records of changes to individual relation files. They only make sense when applied to an identical filesystem-level copy of the original database as created from a pg_basebackup, via rsync, etc.

If you try to apply WAL archives to a DB that's restored from a pg_dump they'll be rejected because as far as PostgreSQL is concerned they're different, unrelated databases. At the block/filesystem level they are; the blocks are different, the relation IDs are different, etc. The WAL archives make no sense when applied to this database, even though at the logical level the row data looks the same.

You must use a pg_basebackup copy to restore WAL, and it only works if you have a complete and uninterrupted sequence of WAL from the moment the base backup was taken until the final segment.

Related Solutions

Postgresql – How does PostgreSQL handle Checkpoints in the middle of a WAL-enabled backup

You asked:

how postgreSQL will handle the recovery with a pg_data content containing some files which are inconsistent.

pg_start_backup() ensure the data file is at least as new as the checkpoint. On recovery, the logs are applied.

If the data is old, the log will update it..

If the data is new, the log will have same content. There is no hurt writing it again.

The data are never newer then the log, because the logs are write ahead (WAL).

You asked:

... xfs-freeze ...

xfs-freeze is alike to pg_start_backup(), it don't take a snapshot. You need a volume manager to do that.

You asked:

... why do create tablespace & create database statements are unsupported if the WAL can replay everything?

It is supported, just some little gotcha. See http://www.postgresql.org/docs/8.1/static/backup-online.html :

23.3.5. Caveats

CREATE TABLESPACE commands are WAL-logged with the literal absolute path, and will therefore be replayed as tablespace creations with the same absolute path. This might be undesirable if the log is being replayed on a different machine. It can be dangerous even if the log is being replayed on the same machine, but into a new data directory: the replay will still overwrite the contents of the original tablespace. To avoid potential gotchas of this sort, the best practice is to take a new base backup after creating or dropping tablespaces.

PostgreSQL on AWS – How to Get WAL Files from RDS Instance

Update: I've posted about this to the AWS forums - please go chime in and ask for it there.

At time of writing, Amazon RDS does not support physical replication outside RDS. You can GRANT users the REPLICATION right using an rds_superuser login, but you can't configure replication entries for outside IPs in pg_hba.conf.

Furthermore, when you create a DB parameter group in RDS, some key parameters are shown but locked, e.g. archive_command, which is locked to /etc/rds/dbbin/pgscripts/rds_wal_archive %p. AWS RDS for PostgreSQL does not appear to expose these WALs for external access (say, via S3) as it would need to if you were to use WAL-shipping replication for external PITR.

So at this point, if you want wal-shipping, don't use RDS. It's a canned easy-to-use database, but easy-to-use often means that it's also limited, and that's certainly the case here. As Joe Love points out in the comments, it provides WAL shipping and PITR within RDS, but you can't get access to the WAL to that from outside RDS.

So you need to use RDS's own backup facilities - dumps, snapshots and its own WAL-based PITR.

Even if RDS did let you make replication connections (for pg_basebackup or streaming replication) and allowed you to access archived WAL, you might not be able to actually consume that WAL. RDS runs a patched PostgreSQL, though nobody knows how heavily patched or whether it significantly alters the on-disk format. It also runs on the architecture selected by Amazon, which is probably x64 Linux, but not easily determined. Since PostgreSQL's on disk format and replication are architecture dependent, you could only replicate to hosts with the same architecture as that used by Amazon RDS, and only if your PostgreSQL build was compatible with theirs.

Among other things this means that you don't have any easy way to migrate away from RDS. You'd have to stop all writes to the database for long enough to take a pg_dump, restore it, and get the new DB running. The usual tricks with replication and failover, with rsync, etc, won't work because you don't have direct access to the DB host.

Even if RDS ran an unpatched PostgreSQL Amazon probably wouldn't want to permit you to do WAL streaming into RDS or import into RDS using pg_basebackup for security reasons. PostgreSQL treats the data directory as trusted content, and if you've crafted any clever 'LANGUAGE c' functions that hook internal functionality or done anything else tricky you might be able to exploit the server to get greater access than you're supposed to have. So Amazon aren't going to permit inbound WAL anytime soon.

They could support outbound WAL sending, but the above issues with format compatibility, freedom to make changes, etc still apply.

Instead you should use a tool like Londiste or Bucardo.

Best Answer

Related Solutions

Postgresql – How does PostgreSQL handle Checkpoints in the middle of a WAL-enabled backup

PostgreSQL on AWS – How to Get WAL Files from RDS Instance

Related Question