Managed to solve this, these are the steps I followed:
Firstly, I contacted the Amazon RDS team by posting on their discussion forum, they confirmed it was the mysqld process taking up all this CPU - this eliminated a configuration fault with something else running on the physical server
Secondly I tracked down the source of the queries that were running:
SELECT `mytable`.* FROM `mytable` WHERE `mytable`.`foreign_key` = 231273 LIMIT 1
I originally overlooked this as the cause, because none of these queries seemed to be taking particularly long when I monitored the show processlist output. After exhausting other avenues, I decided it might be worth following up....and I'm glad I did.
As you can see in the show processlist output, these queries were coming from a utlility server, which runs some tactical utility jobs that exist outside of our main application code. This is why they were not showing up as slow or causing issues in our new relic monitoring, because the new relic agent is only installed on our main app server.
Loosely following this guide:
http://www.mysqlperformanceblog.com/2007/02/08/debugging-sleeping-connections-with-mysql/
I was able to trace these queries to a specific running process on our utility server box. This was a bit of ruby code that was very inefficiently iterating through around 70,000 records, checking some field values and using those to decide whether it needs to create a new record in 'mytable.' After doing some analysis I was able to determine, the process was no longer needed so could be killed.
Something that was making matters worse, there seemed to be 6 instances of this same process running at one time due to the way the cron job was configured and how long each one took! I killed off these processes, and incredibly our CPU usage fell from around 100% to around 5%!
Since you tried the above steps, your sticking point seems to be the one you can't alter.
I wrote a post back in August 2012 (Local database vs Amazon RDS), mentioning how the number of DB connections and the Buffer Size are fixed and immutable for each in RDS server model. I also mentioned that the Transactions Logs are fixed at 128M, no matter what RDS server model you use.
SUGGESTION: Be kind to yourself. Migrate the DB from RDS to EC2. Then, you have the necessary flexibility to change the InnoDB Infrastructure. Consequently, you will have to write your own backups and setup replication by hand.
I know you said going to EC2 is your last resort. Since you cannot tune InnoDB effectively in an RDS environment, EC2 is the next logical step.
Best Answer
Aurora runs as mysql 5.6.10a so the comment queries still work as expected in regular mysql