Postgresql – Best practice for securely backing up RDS Postgres “offsite”

amazon-rdsawsbackuppostgresql

We're running Postgres on RDS, which is largely great. The big issue with it, however, is to do with AWS' security model, which allows anyone with various permissions to delete everything – your DBs, your backups, the whole lot.

In particular, you can't prevent someone with access to create IAM users and groups from being able to then give a new user more permissions than they themselves have, so either compromised credentials or a disgruntled employee could destroy everything if you rely on RDS' own backups.

EDIT:

Just in case you're wondering what the issue might be, have a quick read of http://www.infoworld.com/article/2608076/data-center/murder-in-the-amazon-cloud.html

So, the "sensible" thing to do seems to be to have a separate AWS account on which you have basically no one having any access, and have a key which can write stuff up to S3 (and possibly read it back if you fancy, though this is probably optional).

This way, you can back things up to an account from which your main AWS admins can't, by accident or design, delete stuff, and then use lifecycle rules to manage it.

Sorry for the long build up – I am literally amazed that people don't seem to have asked/answered this before, as it seems such an obvious thing for almost anyone using RDS (or indeed just AWS) to need to do, but…

How do I backup Postgres in a sensible fashion for this?

Some things to consider:

Storage space and so on aren't infinite, so ideally don't want to be doing a full pg_dump/gzip/encrypt/upload to S3, which is the obvious solution, as it would mean hundreds of Gb a day going up there which is probably overkill.
We don't have access to the "core" servers, so can't do differential type backups as would be more "normal"

Would it work to do a pg_dump, then use some sort of diff program to only upload/store the diffs? Since I don't think pg_dump produces things in a specific order, I'm not sure this would work (in the way intended)?

Any other ideas?

I'm aware by the way of the use of manual snapshot sharing – https://aws.amazon.com/blogs/aws/amazon-rds-update-cross-account-snapshot-sharing/ – which works great if you're using unencrypted RDS – but we aren't, for reasons I can't fully explain except that "it sounds good".

Best Answer

I think your way of thinking with a separate machine in AWS which has access to the db is generally the right way to do it, and would just add that the "normal" way of making incremental backups (with any "real" database) is named PITR, for Point-in-time-recovery. If you search for this term and PostgreSQL, you will find tutorials on how to do generic incremental database backups, and it's up to you to implement them using AWS.

Related Solutions

Postgresql – How to seamlessly upgrade the major version of an AWS RDS postgres database

This is a good question,
working in cloud environment is tricky sometimes.

You can use pg_dumpall -f dump.sql command, that will dump your entire database to a SQL file format, In a way that you can reconstruct it from scratch pointing to other endpoint. Using psql -h endpoint-host.com.br -f dump.sql for short.

But to do that, you will need some EC2 instance with some reasonable space in disk (to fit your database dump). Also, you will need to install yum install postgresql94.x86_64 to be able to run dump and restore commands.

See examples at PG Dumpall DOC.

Remember that to keep integrity of your data, it is recommended (some cases it will be mandatory) that you shutdown the systems that connect to the database during this maintenance window.

Also, if you need speed up things, consider using pg_dump instead pg_dumpall, by taking advantage of parallelism (-j njobs) parameter, when you determine the number of CPUs involved in the process, for example -j 8 will use until 8 CPUs. By default the behavior of pg_dumpall or pg_dump is use only 1. The only advantage by using pg_dump instead pg_dumpall is that you will need to run the command for each database that you have, and also dump the ROLES (groups and users) separated.

See examples at PG Dump DOC and PG Restore DOC.

Postgresql – unable to create schema on amazon rds for postgres

RDS PostgreSQL in this sense is no different from any other PostgreSQL installation. You can, of course, create schemas and whatever objects you want.

The problem here is missing privileges. By default, the owner of the DB (the role that created it) has full access (CREATE, CONNECT, TEMP) on the DB, and can grant these to other roles, too.

Connect to the DB with the role that created it, then you can try to create a schema to prove the above. To make other users able to create schemas inside this DB, do the following:

GRANT ALL ON DATABASE your_db TO andy_k;

After this, you can log in as andy_k, and create schemas.

Best Answer

Related Solutions

Postgresql – How to seamlessly upgrade the major version of an AWS RDS postgres database

Postgresql – unable to create schema on amazon rds for postgres

Related Question