PostgreSQL on AWS – How to Get WAL Files from RDS Instance

amazon ec2postgresql

We have a Postgres RDS instance on Amazon Web Services. We have automatic backups enabled, and we take snapshots on a daily basis. We would like to generate a local 'up-to-date' backup of the RDS instance that we can manage ourselves. Running pg_dump against the instance is not sufficient because we want to be able to restore the database to any point in time. We would prefer to have a local backup of RDS and all WAL files since that backup was taken. Questions:

  1. Is it possible to access the WAL files and backups that RDS is automatically generating in its backup routine? This would be ideal. I would want to download a local copy of them. After initial investigation, I feel the answer to this question is 'no'. It sounds like RDS is storing its WAL files and backups in S3, but it makes them inaccessible to us. I would love confirmation.

  2. Is there any other way to access transactions (WAL files) that have occurred on the RDS instance? I imagine we should be able to create a Postgres database on an EC2 and 'feed' transactions from our primary 'live' RDS instance into this EC2 instance. Once our EC2 instance is updated, we could pull WAL files from there. What a headache, though :/ Is this setup possible? What is the magic to 'feed' from our RDS instance to the EC2 instance so that it is always up to date?

Thanks!

Best Answer

Update: I've posted about this to the AWS forums - please go chime in and ask for it there.


At time of writing, Amazon RDS does not support physical replication outside RDS. You can GRANT users the REPLICATION right using an rds_superuser login, but you can't configure replication entries for outside IPs in pg_hba.conf.

Furthermore, when you create a DB parameter group in RDS, some key parameters are shown but locked, e.g. archive_command, which is locked to /etc/rds/dbbin/pgscripts/rds_wal_archive %p. AWS RDS for PostgreSQL does not appear to expose these WALs for external access (say, via S3) as it would need to if you were to use WAL-shipping replication for external PITR.

So at this point, if you want wal-shipping, don't use RDS. It's a canned easy-to-use database, but easy-to-use often means that it's also limited, and that's certainly the case here. As Joe Love points out in the comments, it provides WAL shipping and PITR within RDS, but you can't get access to the WAL to that from outside RDS.

So you need to use RDS's own backup facilities - dumps, snapshots and its own WAL-based PITR.


Even if RDS did let you make replication connections (for pg_basebackup or streaming replication) and allowed you to access archived WAL, you might not be able to actually consume that WAL. RDS runs a patched PostgreSQL, though nobody knows how heavily patched or whether it significantly alters the on-disk format. It also runs on the architecture selected by Amazon, which is probably x64 Linux, but not easily determined. Since PostgreSQL's on disk format and replication are architecture dependent, you could only replicate to hosts with the same architecture as that used by Amazon RDS, and only if your PostgreSQL build was compatible with theirs.

Among other things this means that you don't have any easy way to migrate away from RDS. You'd have to stop all writes to the database for long enough to take a pg_dump, restore it, and get the new DB running. The usual tricks with replication and failover, with rsync, etc, won't work because you don't have direct access to the DB host.

Even if RDS ran an unpatched PostgreSQL Amazon probably wouldn't want to permit you to do WAL streaming into RDS or import into RDS using pg_basebackup for security reasons. PostgreSQL treats the data directory as trusted content, and if you've crafted any clever 'LANGUAGE c' functions that hook internal functionality or done anything else tricky you might be able to exploit the server to get greater access than you're supposed to have. So Amazon aren't going to permit inbound WAL anytime soon.

They could support outbound WAL sending, but the above issues with format compatibility, freedom to make changes, etc still apply.


Instead you should use a tool like Londiste or Bucardo.