AWS RDS MySQL – Diagnosing Failures

amazon-rdsawsMySQL

I have a simple RDS setup: one database, no replication. It has been running for around 2 years without any problems. CPU usage is generally less than 5%, sometimes boosting to around 10%.

Today, without any apparent reason or warning, my application lost connection with the DB. Looking at the log files, I could see the message "Recovery of the DB instance started…" and a few minutes later "Recovery of the DB instance complete". At that point my application was able to reconnect and work fine.

How do I go about diagnosing this further? The log file has about 30 lines in it, starting with "Giving 2 client threads a chance to die gracefully", then "Shutting down slave threads". After that, the service goes through a restart procedure.

Is it normal operation for RDS to 'recover the instance' after a failure like this? Presumably I could lose a few minutes of data?

Update:

The logs are no longer available, so I cannot post them. Also, I notice that the freeable memory jumps up sharply from the time of the incident, which would seem to be a good thing.

Best Answer

I have had this happen, even to production RDSs, though it is rare.

How to go about diagnosing

Check there was no advance notice of maintenance work: Did you get any email notifications from AWS to say that your instance needed mandatory system upgrades or maintenance? Was this in your instance's permitted maintenance window?

Raise a ticket with AWS Support: If you have AWS support, raise a ticket with them to ask them what happened. On the occasions that this has happened and I have raised a ticket they were not able to give me a good reason for the DB going away, but have generally shrugged and named local networking issues on the instance's HyperV.

Did I loose data? It's unlikely though that you lost data if you are using innoDb and it was a shutdown issued by AWS. The log lines that you cite: "Giving 2 client threads a chance to die gracefully", then "Shutting down slave threads" Are lines from a command initiated shutdown, rather than a crash. It looks like AWS issued a shutdown for you.

Other Notes Occasionally AWS sees fit to move your instance to another Hypervisor without prior notice, sometimes because they want to clear the Hypervisor that your instance is on, it may have developed hardware problems, for example. There might be networking issues within AWS, or you might be subject to 'noisy neighbours'. There are many possible reasons and unfortunately contractually AWS don't have to tell you why.

Related Solutions

MySQL 5.5 Runs Out of Memory, Drops All Connections When Creating Many Databases

The first thing that comes to mind is the Server Model : db.m1.large

What limiting factors are placed on a MySQL RDS?

If you spin up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects

max_connections
innodb_buffer_pool_size

Here is a Chart I posted Last Month

MODEL      max_connections innodb_buffer_pool_size
---------  --------------- -----------------------
t1.micro   34                326107136 (  311M)
m1-small   125              1179648000 ( 1125M,  1.097G)
m1-large   623              5882511360 ( 5610M,  5.479G)
m1-xlarge  1263            11922309120 (11370M, 11.103G)
m2-xlarge  1441            13605273600 (12975M, 12.671G)
m2-2xlarge 2900            27367833600 (26100M, 25.488G)
m2-4xlarge 5816            54892953600 (52350M, 51.123G)

I posted this in my other posts in the DBA StackExchange

You are using m1.large. Since the InnoDB Buffer Pool is 3/4 of the Instance. That's means you have 7.2G but only 1.8 is usable. That model is capable of having up to 623 connections. Each connection can consume memory because of

join buffers
sort buffers
read buffers
thread info

Amazon RDS is simply micromanaging resources. Since DB Connections can consume RAM, connections are probably being disallowed due to the lack of RAM needed.

SUGGESTIONS

Try reducing InnoDB log IO innodb_flush_log_at_trx_commit=0 during the mass creation
Make sure you are not doing large transactions during any automatic backups or snapshots
Try a bigger server model m1.xlarge or m2.xlarge

Mysql – AWS RDS Really Odd Error… #1041 OUT OF MEMORY, Buffers OK and Memory OK

I do not believe you are alone here. AWS has push me from 5.7.11 to 5.7.17 and now I am unable to perform Alter table commands when there is any sort of memory pressure on the Database.

Have a look at https://forums.aws.amazon.com/thread.jspa?threadID=251866

At this point it looks like there may be an issue present in MySQL 5.6 < 5.6.37 and 5.7 < 5.7.19.

Unfortunately AWS RDS does not have any 5.6 images > 5.6.35 or 5.7 > 5.7.17

Before using 5.7.17 I had used 5.7.11 for some time without issue. If you are able to dump/restore your databases that might be the best option for you, otherwise you can give larger instances a shot.