Managed to solve this, these are the steps I followed:
Firstly, I contacted the Amazon RDS team by posting on their discussion forum, they confirmed it was the mysqld process taking up all this CPU - this eliminated a configuration fault with something else running on the physical server
Secondly I tracked down the source of the queries that were running:
SELECT `mytable`.* FROM `mytable` WHERE `mytable`.`foreign_key` = 231273 LIMIT 1
I originally overlooked this as the cause, because none of these queries seemed to be taking particularly long when I monitored the show processlist output. After exhausting other avenues, I decided it might be worth following up....and I'm glad I did.
As you can see in the show processlist output, these queries were coming from a utlility server, which runs some tactical utility jobs that exist outside of our main application code. This is why they were not showing up as slow or causing issues in our new relic monitoring, because the new relic agent is only installed on our main app server.
Loosely following this guide:
http://www.mysqlperformanceblog.com/2007/02/08/debugging-sleeping-connections-with-mysql/
I was able to trace these queries to a specific running process on our utility server box. This was a bit of ruby code that was very inefficiently iterating through around 70,000 records, checking some field values and using those to decide whether it needs to create a new record in 'mytable.' After doing some analysis I was able to determine, the process was no longer needed so could be killed.
Something that was making matters worse, there seemed to be 6 instances of this same process running at one time due to the way the cron job was configured and how long each one took! I killed off these processes, and incredibly our CPU usage fell from around 100% to around 5%!
From Windows Platform Limitations in the MySQL 5.5 Reference Manual:
On Windows 32-bit platforms, it is not possible by default to use more than 2GB of RAM within a single process, including MySQL.
What's to investigate? We already know what happens when you max the memory: Really Bad Thingsā¢.
If there's a remote chance that your setup will periodically require this much memory, 32-bit Windows is the wrong platform.
Almost universally, the biggest memory consumer is of course the InnoDB Buffer Pool, but there's no need for any queries to test this... you don't need any activity at all, or any tables, because the entire amount of memory declared for innodb_buffer_pool_size
is allocated immediately when the server starts. The buffer pool never grows, never shrinks, never changes, ever. The number of free pages changes, but those are not "free" from the operating system's perspective -- they're still just as allocated, merely marked as containing nothing of interest within InnoDB.
If the operating system refuses to allocate the amount of memory provisioned for the buffer pool, MySQL will simply refuse to start.
This is illustrated here, where the OP mistakenly believed that the server was crashing and restarting "because" memory couldn't be allocated for the buffer pool, but was in fact crashing for a Linux-specific reason but then refusing to restart because the system would not allocate the total amount of memory required for the pool, due to overuse of available memory by something else... but this allocate-all-at-startup behavior for the InnoDB buffer pool is not platform-specific.
So, you should be able to set this value near the max and then find that taking the server process over the edge should not require very much additional effort at all. But I'm still not sure what the point is.
As you realize, MySQL uses memory for a variety of different purposes, several of which are dynamically-sized, definable on a per-connection basis, and allocated on demand, which makes it virtually impossible to provision a server based on limiting memory usage to some worst-case scenario absolute value, yet expecting that server to be able to handle its typical load efficiently.
The simplest illustration of this is the fact that you can obviously reduce the theoretical maximum memory utilization of a given instance by restricting the maximum number of simultaneous client connections... but any given application needs a certain number of available connections to perform efficiently, and if that number is below your target value, then you're not really solving anything -- it just feels like you have.
I say, either your server has enough memory for the workload, or it doesn't. If it doesn't, then attempting to "tune" your way out of potential trouble is unlikely to offer much in the way of solutions.
Some ideas on how to easily generate demand for more memory...
SELECT * FROM large_table ORDER BY non_indexed_column;
SELECT * FROM large_table WHERE non_indexed_column = some_value;
SELECT * FROM large_table WHERE some_column LIKE '%a_freqent_match%';
Queries like these could trigger the allocation of a sort buffer, a read buffer and/or a random read buffer, which should make a new request to the OS for the memory that buffer requires.
Simple solution:
DELIMITER $$
DROP PROCEDURE IF EXISTS `test`.`eat_memory_until_server_crashes` $$
CREATE PROCEDURE `test`.`eat_memory_until_server_crashes`()
BEGIN
-- this procedure is intended to eat as much memory as it can
-- it creates a series of consecutively-numbered session variables as large
-- as your configuration will allow them to be.
-- do not run this unless you intend to crash your server
-- also, do not run from a gui tool -- use the mysql command line client:
-- mysql> CALL test.eat_memory_until_server_crashes;
-- if you kill the query or thread before the server crashes,
-- the memory consumed will be returned to the OS
DECLARE counter INT DEFAULT 0;
LOOP
SET counter = counter + 1;
SET @qry = CONCAT('SET @crash_me_',counter,' := REPEAT(\'a\', @@max_allowed_packet)');
SELECT counter, @qry;
PREPARE hack FROM @qry;
EXECUTE hack;
DEALLOCATE PREPARE hack;
-- adjust timing or remove this entirely depending on how quickly you want this to happen
DO SLEEP(0.1);
END LOOP;
END $$
DELIMITER ;
Inspiration for this: Schwartz, Baron; Zaitsev, Peter; Tkachenko, Vadim (2012-03-05). High Performance MySQL: Optimization, Backups, and Replication (Kindle Location 12194). OReilly Media - A. Kindle Edition.
Best Answer
20-40ms is about a thousand miles of latency. Where are the client and server, relative to each other? Is Amazon moving the Server around? Or the Client?
innodb_buffer_pool_size
is set reasonably by Amazon (I think). Do not raise it without raising the amount of RAM? There is no benefit in making the buffer_pool much bigger than the entire dataset (data & indexes).For an 8GB VM (which is probably only 7.5GB), 6G is about the max that is safe -- assuming that only MySQL is running in that VM.
Turning on the QC is usually a waste on production systems (except on Aurora). Also, do not set
query_cache_size
bigger than about 50M -- else it could slow things down.20% CPU -- percent of one core? Of all cores? 20% of all cores seems high. What is your slowest query? Also
SHOW CREATE TABLE
.