I have two RDS instances: a R/W master and a read-only replica.
On 29th June, the replica stopped registering replication data – not sure if this is related.
On July 3rd, the master's CPU usage started increasing monotonically, and drastically:
It's almost at critical 100% now.
The query volume and hasn't really changed, to my knowledge. The only thing which happened around then was a django-celery daemon in my web tier grabbed a whole CPU core – a force kill of that seemed to fix problems on the web tier, but it seems likely that the DB tier problems may be related.
DB size also started increasing monotonically at the same time:
There are no long queries in the processlist, and no INSERTs at all really, so I'm not sure how to find out which tables are growing, and where that CPU is going.
Are there MySQL diagnostics while can show me table size trends? Profile global ongoing queries? Profile global CPU usage?
I've already rebooted the server a couple of times, to no avail.
There's obviously still lots of space left on the server, but when we get to 100% CPU usage, things are going to get ugly, so any help much appreciated!
Best Answer
I have some queries for you regarding table sizes that you can run in MySQL during these spikes
Database size in terms of StorageEngine (MB)
Database size in terms of Databases (MB)
Database size in terms of Database/StorageEngine (MB)
Pay attention to certain markers
I recommend downloading MySQL Administrator (I know, it's old but I still you it for quick and dirty "I WANNA SEE STATS NOW" moments of day) and set it up. I customized my own graphs to watch the size of the InnoDB Buffer Pool and its dirty pages. You could also just use the Connection Health tab.