Amazon RDS – Resolve Freezes During mysqldump

amazon-rdsawsMySQLmysqldump

My questions is quite similar to Mysqldump from the Amazon RDS. I have the same behavior. Everytime when I do a backup using mysqldump my entire RDS freezes and hence my application is not responding. I am using –single-transaction with the mysqldump command but that doesn't help at all. With this, RDS is quite unusable since I can not dump a backup or data sample during live operation. Is there a way to get the data out of RDS without freezing everything?

Best Answer

I suggest taking a look at processlist while the DB is dumping. There could be locked MyISAM tables causing queue of statements waiting for locks to be released.

Related Solutions

Mysql – the recommended way to backup a MySQL/Amazon RDS database to S3

The recommended way to back up RDS is with automatic backups and DB snapshots. DB snapshots are basically the same as EBS snapshots, which are stored in S3 behind the scenes, but are only available within the same region.

If you need cross-region fault tolerance (good plan!), there is no way to restore your data in another region without doing it "the hard way" from a mysqldump. Your alternatives are to back up using mysqldump (slow and terrible for any reasonable sized dataset), or set up your own EC2-based slave in another region and back that up using any available method (xtrabackup, EBS snapshots, etc). However, then you are back to managing your own MySQL instances, so you might as well abandon RDS entirely.

For my money, RDS provides absolutely no benefits in really any way and a whole lot of disadvantages in performance, flexibility, and reliability. I would ask yourself what value RDS provides to you.

Mysql – Random write freezes

I think the issue is due to the choice of the clustered index. From MySQL docs, Clustered and Secondary Indexes:

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

Also check the answer by @marc_s in this SO question: How to choose the clustered index in SQL Server?, where he mentions:

According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

Unique

Narrow

Static

And if you can also guarantee:

Ever-increasing pattern

then you're pretty close to having your ideal clustering key!

Now, your clustered index is the (Primary Key):

hash varchar(5) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,

which (lets go through the check-list) is:

Unique (yes, OK)
Narrow (yes, OK)
Static (perhaps, you know that)

but is probably not:

Ever-increasing pattern (No, it probably isn't)

So, what happens when you use a non-ever-increasing clustered index?

I can't answer better than Kimberly L. Trip: Ever-increasing clustering key - the Clustered Index Debate..........again!

If the clustering key is ever-increasing then new rows have a specific location where they can be placed. If that location is at the end of the table then the new row needs space allocated to it but it doesn't have to make space in the middle of the table. If a row is inserted to a location that doesn't have any room then room needs to be made (e.g. you insert based on last name then as rows come in space will need to be made where that name should be placed). If room needs to be made, it's made by SQL Server doing something called a split. Splits in SQL Server are 50/50 splits - simply put - 50% of the data stays and 50% of the data is moved. This keeps the index logically intact (the lowest level of an index - called the leaf level - is a douly-linked list) but not physically intact. When an index has a lot of splits then the index is said to be fragmented. Good examples of an index that is ever-increasing are IDENTITY columns (and they're also naturally unique, natural static and naturally narrow) or something that follows as many of these things as possible - like a datetime column (or since that's NOT very likely to be unique by itself datetime, identity).

Note that despite the mention of SQL-Server, the same concept applies to InnoDB clustered indexes as well. I suppose that the clustered index has 2 issues:

When you are inserting a new row (the "random" hash guarantees that) it gets inserted in a random location of the index. This means that it sometimes will find no space there available to be inserted (note that InnoDB always leaves some space free in the index but when that free-available space is filled) there has to be some rearrangement of the index - and that takes time.
What the rearrangement is also causing over time is fragmentation of the index. Which will eventually make other queries and statements slower.

Best Answer

Related Solutions

Mysql – the recommended way to backup a MySQL/Amazon RDS database to S3

Mysql – Random write freezes

Related Question