You can have a high load from lots of short queries as well. Try setting your long_query_time to 0, and force all clients to reconnect. Let that run for a few minutes and then set it back to your default. You can peruse the results by hand or use a tool like pt-query-digest.
In either case, you should be able to figure out if there is in fact a bunch going on, just very quickly.
Managed to solve this, these are the steps I followed:
Firstly, I contacted the Amazon RDS team by posting on their discussion forum, they confirmed it was the mysqld process taking up all this CPU - this eliminated a configuration fault with something else running on the physical server
Secondly I tracked down the source of the queries that were running:
SELECT `mytable`.* FROM `mytable` WHERE `mytable`.`foreign_key` = 231273 LIMIT 1
I originally overlooked this as the cause, because none of these queries seemed to be taking particularly long when I monitored the show processlist output. After exhausting other avenues, I decided it might be worth following up....and I'm glad I did.
As you can see in the show processlist output, these queries were coming from a utlility server, which runs some tactical utility jobs that exist outside of our main application code. This is why they were not showing up as slow or causing issues in our new relic monitoring, because the new relic agent is only installed on our main app server.
Loosely following this guide:
http://www.mysqlperformanceblog.com/2007/02/08/debugging-sleeping-connections-with-mysql/
I was able to trace these queries to a specific running process on our utility server box. This was a bit of ruby code that was very inefficiently iterating through around 70,000 records, checking some field values and using those to decide whether it needs to create a new record in 'mytable.' After doing some analysis I was able to determine, the process was no longer needed so could be killed.
Something that was making matters worse, there seemed to be 6 instances of this same process running at one time due to the way the cron job was configured and how long each one took! I killed off these processes, and incredibly our CPU usage fell from around 100% to around 5%!
Best Answer
You can see which queries are slow by enabling the slow query log (
SET GLOBAL slow_query_log=1;
and setting a suitable value for the long query time variable to set a threshold for what you consider to be "long". (E.g. maybeSET GLOBAL long_query_time=3;
- the unit is seconds). However, queries may be slow for other reasons than high CPU utilisation, e.g. due to high I/O utilisation. Still, this can be a starting point for at least identifying poor queries.The slow log is by default written to a file in the data dir called '${HOSTNAME}-slow.log'. You can then use a tool like mysqldumpslow to analyse this file, e.g.:
mysqldumpslow -s t /var/lib/mysql/localhost-slow.log
. It will extract each slow query and order them from slowest to least slow. Or you can use pt-query-digest to do the same thingYou can then enable query profiling for your session (
SET profiling = 1;
), execute each of the candidate queries you have identified in the slow query log, and do:This gives you a list of the recent queries and their query IDs. Use this ID to find the CPU utilisation for each query with:
This will show the various stages of the query execution, and the duration and CPU utilisation (user and system) for each of them.
Note that this isn't 100% accurate - the documentation page for SHOW PROFILE notes that:
Note also that the page says
SHOW PROFILE
is deprecated as of MySQL 5.6.7, and that the performance_schema is meant to replace it; see Query Profiling Using Performance Schema. However, there doesn't seem to be a way to extract CPU utilisation info from it, so you may just have to useSHOW PROFILE CPU ...