Mysql – Amazon RDS for MySQL vs installing MySQL on an Amazon EC2 instance

amazon ec2amazon-rdsMySQL

At work, we host all our webservers on Amazon EC2 and usually have used MySQL databases installed on the same box as our Apache webserver, and communicated with them on localhost. We now face a need to migrate our database to its own server for one of our systems. I have a choice between two solutions: use Amazon RDS, or just launch a new Amazon EC2 box and install MySQL on it.

RDS, being a dedicated database service provided by the same company as EC2, seems like it ought to be the obviously better option. However, when I look at the pricing for the two options (see http://aws.amazon.com/ec2/pricing and http://aws.amazon.com/rds/pricing) it seems that an RDS server costs almost twice as much as an EC2 server for a box with the same specs.

Given that I'm capable of handling backups myself and that EC2 offers the same ability to scale up the instance as required that RDS does, I can't see any reason at all to use RDS instead of EC2. It seems like I'm probably missing something big, though, because if I were right, nobody would use RDS. What exactly am I missing, and what are the advantages of RDS over installing your own database on an EC2 instance?

Best Answer

I'm a big AWS fan in general... but RDS, not so much.

@RolandoMySQLDBA has pointed out are some pretty good points against it.

The only advantage I see in RDS compared to MySQL on EC2 are the ability to do point and click snapshots, clones, and point-in-time recovery, but these are not nearly sufficient to make up for the loss of control and flexibility and they most certainly don't justify the price being higher. RDS is sexy in some ways, but you can't ultimately trust what you can't ultimately fix, because you can't get to all the moving parts.

I don't like not having the SUPER privilege. I don't like not being able to tail the error log. I don't like not being able to run "top" or "iostat" on my database server to see how the cores and drives are enjoying the load. I don't like not having access to the federated storage engine. I don't like the thought of paying for a hot standyby (multi-AZ) backup master machine that I can't even leverage as a read replica. Sure, there are perfectly reasonable explanations why all of these constraints have to be in place for MySQL to be successfully packaged and sold as RDBMS-in-a-box. The only thing is, RDBMS-in-a-box "solves" a whole series of problems I don't have... and gets in my way, causing new problems.

But the absolute deal-breaker for me with RDS is the complete lack of access to the binary logs and replication. Binlogs, especially row-based, are a fantastic recovery tool for minor disasters, but they are of no help to you if you can't access them. Want to configure an on-premise server at your office as a read-replica of your production database in RDS? Something to take local backups from, do reporting, have on hand for disaster recovery should something unthinkable happen to your data that lives in RDS? That's an idea that is simultaneously obvious and brilliant.

Oops, sorry, direct access to replication is not available. Sure, you can pay for read replicas... but only as other RDS instances. Not a value proposition in my book.

Update: One Significant Change in RDS for MySQL 5.6

I still prefer running my own server (even in EC2) as opposed to running RDS for a number of reasons, including the lack of support for User-Defined Functions, the inability to use the Federated Storage Engine, and the inability to have the one extra connection available for emergency access... however...

Amazon has made a significant change in MySQL 5.6 for RDS that eliminates one of my major objections -- perhaps my largest objection: the binary logs are now accessible and you can run a non-RDS instance as a slave, or connect other utilities to the server that read the binlog stream.

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Exporting.NonRDSRepl.html

Officially, the documentation indicates that they are exposing this so that you can set up a slave for the purpose of doing a live migration -- you synchronize the foreign future master server from the existing RDS instance using mysqldump, then connect it to RDS as a slave to get a live feed of updates via the replication stream until your application is migrated to the new master -- but unofficially, you can do this on an ongoing basis as long as you don't expect them to support you... which, to me, seems reasonable.

This was confirmed in a recent Webinar, in a conversation that begins at around 56:45:

"You can keep it in a replicated state indefinitely...

"...as long as you take the responsibility to maintain the replication..."

"We are not preventing you from doing ongoing replication if that's what you want."

This new capability was enough for me to drop my blanket objection to using RDS in our public-facing website-backing MySQL instances, where we don't use FEDERATED or some of the other things as much or at all.

So I'm still not in favor of it, but I'm no longer against it, since having a live stream of the binary logs puts me ultimately back in control of the data in real time and the responsibility for ensuring that no transactions are lost in a catastrophic outage is back with me, because I, as the DBA, am back in control -- which is exactly how I want it. Having a third party vendor to point fingers at, or file a lawsuit against, or whatever, doesn't get your lost data back if it disappears down a black hole inside a black box.

Management seems to like the "idea" of RDS and doesn't object to the cost difference, so we are now launching all new web sites with RDS behind them.

The point and click point-in-time recovery, I admit, is a nice feature in RDS... it doesn't alter or disrupt your existing machine -- instead, it fires up an entirely new instance, using the backup that was closest-in-time to the selected point in time, and then applies the necessary binlogs to bring that new machine forward to the point in time that you've specified.

Related to this, but in the other direction, it's also possible, now, to use a similar strategy to migrate a live MySQL database into RDS... you can connect an RDS master (presumably, typically, this would be a newly-deployed instance) as a slave of an existing system so that the RDS instance has the live version of the data at the time you migrate into it. Unlike access to the RDS binlogs for outward replication, which only works in 5.6, the inward replication is supported in RDS beginning with 5.5.33 and 5.6.13.