Postgresql – How to seamlessly upgrade the major version of an AWS RDS postgres database

amazon-rdspostgresql

This morning I was involved in upgrading a PostgreSQL database on AWS RDS. We wanted to move from version 9.3.3 to version 9.4.4. We had "tested" the upgrade on a staging database, but the staging database is both much smaller, and doesn't use Multi-AZ. It turned out this test was pretty inadequate.

Our production database uses Multi-AZ. We've done minor version upgrades in the past, and in those cases RDS will upgrade the standby first and then promote it to master. Thus the only downtime incurred is ~60s during the failover.

We assumed the same would happen for the major version upgrade, but oh how wrong we were.

Some details about our setup:

db.m3.large
Provisioned IOPS (SSD)
300 GB storage, of which 139 GB is used
We had RDS OS upgrades outstanding, we wanted to batch with this upgrade to minimise downtime

Here are the RDS events logged while we performed the upgrade:

Database CPU was maxed out between about 08:44 and 10:27. A lot of this time seemed to be occupied by RDS taking a pre-upgrade and post-upgrade snapshot.

The AWS docs don't warn of such repercussions, although from reading them it is clear that an obvious flaw in our approach is that we didn't create a copy of the production database in the Multi-AZ setup and try to upgrade it as a trial run

In general it was very frustrating because RDS gave us very little information about what it was doing and how long it was likely to take. (Again, doing a trial run would have helped…)

Apart from that, we want to learn from this incident so here are our questions:

Is this kind of thing normal when doing a major version upgrade on RDS?
If we wanted to do a major version upgrade in the future with minimal downtime, how would we go about it? Is there some kind of clever way to use replication to make it more seamless?

Best Answer

This is a good question,
working in cloud environment is tricky sometimes.

You can use pg_dumpall -f dump.sql command, that will dump your entire database to a SQL file format, In a way that you can reconstruct it from scratch pointing to other endpoint. Using psql -h endpoint-host.com.br -f dump.sql for short.

But to do that, you will need some EC2 instance with some reasonable space in disk (to fit your database dump). Also, you will need to install yum install postgresql94.x86_64 to be able to run dump and restore commands.

See examples at PG Dumpall DOC.

Remember that to keep integrity of your data, it is recommended (some cases it will be mandatory) that you shutdown the systems that connect to the database during this maintenance window.

Also, if you need speed up things, consider using pg_dump instead pg_dumpall, by taking advantage of parallelism (-j njobs) parameter, when you determine the number of CPUs involved in the process, for example -j 8 will use until 8 CPUs. By default the behavior of pg_dumpall or pg_dump is use only 1. The only advantage by using pg_dump instead pg_dumpall is that you will need to run the command for each database that you have, and also dump the ROLES (groups and users) separated.

See examples at PG Dump DOC and PG Restore DOC.

Related Solutions

Mysql – Is it possible to Clone an Existing MYSQL Slave Database into Amazon RDS with the RDS clone being a slave to an external Master

This is completely possible. The only type of import you can use with RDS is a logical restore, so mysqldump. You will want to use the --dump-slave=2 option :

mysqldump --dump-slave=2 (other options) > dump.sql

This will add the CHANGE MASTER statement as a SQL comment to the dump.sql file. Take the binary log file and log position from that statement and set the RDS up accordingly:

CALL mysql.rds_set_external_master (
  host_name
  , host_port
  , replication_user_name
  , replication_user_password
  , <binary log file>
  , <binary log position>
);

[src]

I would caution you that your replication events are going to be sent as plain text, since ssl_encryption is not implemented for RDS MySQL. If your existing master is not in AWS, to be secure you would need to have your RDS instance in a VPC and set up to communicate with a VPN in your existing infrastructure. Read more about this in the docs: https://aws.amazon.com/vpc/

Amazon RDS Backup – Estimating Downtime

Usually, RDS backups don't cause downtime. Similar to EBS backup, the I/O needs to freeze the DB for a split fo the second to get a consistent LVM snapshot. After that, it is irrelevant how long the backup takes. Alternatively, you may consider taking the snapshot on the Read Replica to avoid the impact on Master.

During the automatic backup window, storage I/O might be suspended briefly while the backup process initializes (typically under a few seconds). You might experience elevated latencies for a few minutes during backups for Multi-AZ deployments. For MariaDB, MySQL, Oracle, and PostgreSQL, I/O activity is not suspended on your primary during backup for Multi-AZ deployments, because the backup is taken from the standby. For SQL Server, I/O activity is suspended briefly during backup for Multi-AZ deployments." https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html

Since you've mentioned that you need to perform RDS Maintenance after the backup. you may want to check out how to avoid downtime altogether during the RDS maintenance. https://workmarket.tech/zero-downtime-maintenances-on-mysql-rds-ba13b51103c2

Disclosure: I work as a Lead DBA for WorkMarket.

Best Answer

Related Solutions

Mysql – Is it possible to Clone an Existing MYSQL Slave Database into Amazon RDS with the RDS clone being a slave to an external Master

Amazon RDS Backup – Estimating Downtime

Related Question