Mysql – Too Many database connections on Amazon RDS

amazon-rdsMySQL

We are having problems with users running queries/views in Drupal that occasionally cause our site to freeze. The freeze occurs because the query causes the number of database connections to go up to 400+ and basically anytime the site goes over 100 database connections the site slows down terribly and just doesn't respond.

We are running Amazon RDS using MySQL Red Hat Linux

We have a large enough EC2 on the front end app server, and a large enough RDS.

The way we are fixing this issue now is to find the offending query, and kill it. Once the query is killed…our database connections drop to around 20 which is the normal amount you see when monitoring the site statistics.

Is there a way to stop the offending query and kill it before it runs too long and consumes the connections? I am trying to automate the killing of the bad query before it happens, or at least realize after 30 seconds its a bad query and kill it.

Best Answer

Here is a Stored Procedure to kill long running SELECTs

DELIMITER $$

DROP PROCEDURE IF EXISTS `test`.`Kill_Long_Running_Selects` $$
CREATE PROCEDURE `test`.`Kill_Long_Running_Selects` (time_limit INT,display INT)
BEGIN

    DECLARE ndx,lastndx INT;

    DROP TABLE IF EXISTS test.LongRunningSelects;
    CREATE TABLE test.LongRunningSelects
    (
        id INT NOT NULL AUTO_INCREMENT,
        idtokill BIGINT,
        PRIMARY KEY (id)
    ) ENGINE=MEMORY;
    INSERT INTO test.LongRunningSelects (idtokill)
    SELECT id FROM information_schema.processlist
    WHERE user<>'system user' AND info regexp '^SELECT' AND time > time_limit;

    SELECT COUNT(1) INTO lastndx FROM test.LongRunningSelects;
    SET ndx = 0;
    WHILE ndx < lastndx DO
        SET ndx = ndx + 1;
        SELECT idtokill INTO @kill_id
        FROM test.LongRunningSelects WHERE id = ndx;
        CALL mysql.rds_kill(@kill_id);
    END WHILE;

    IF lastndx > 0 THEN
        IF display = 1 THEN
            SELECT GROUP_CONCAT(idtokill) INTO @idlist FROM test.LongRunningSelects;
            SELECT @idlist IDs_KIlled;
            SELECT CONCAT('Processes Killed : ',lastndx) Kill_Long_Running_Selects;
        END IF;
    END IF;

END $$

To kill SELECTs running longer than 30 seconds, you run this

CALL test.Kill_Long_Running_Selects(30,0);

If you want to see the connections being killed, you run this

CALL test.Kill_Long_Running_Selects(30,1);

Perhaps you can create a MySQL Event to call this Stored Procedure every minute.

If Amazon does not let you have the EVENT privilege, you will have to write an external shell script on the EC2 server to connect to the DB and run the Stored Procedure. That shell script can be put into a crontab.

If Amazon does not let you have the PROCESS and SUPER privileges, you may need to move the DB out of RDS and into another EC2 instance running MySQL to accomplish this. You could then create the MySQL Event without Amazon's hosting restrictions.

Related Solutions

Mysql – Struggling to debug high CPU usage on Amazon RDS MySQL instance

Managed to solve this, these are the steps I followed:

Firstly, I contacted the Amazon RDS team by posting on their discussion forum, they confirmed it was the mysqld process taking up all this CPU - this eliminated a configuration fault with something else running on the physical server

Secondly I tracked down the source of the queries that were running:

SELECT `mytable`.* FROM `mytable` WHERE `mytable`.`foreign_key` = 231273 LIMIT 1

I originally overlooked this as the cause, because none of these queries seemed to be taking particularly long when I monitored the show processlist output. After exhausting other avenues, I decided it might be worth following up....and I'm glad I did.

As you can see in the show processlist output, these queries were coming from a utlility server, which runs some tactical utility jobs that exist outside of our main application code. This is why they were not showing up as slow or causing issues in our new relic monitoring, because the new relic agent is only installed on our main app server.

Loosely following this guide:

http://www.mysqlperformanceblog.com/2007/02/08/debugging-sleeping-connections-with-mysql/

I was able to trace these queries to a specific running process on our utility server box. This was a bit of ruby code that was very inefficiently iterating through around 70,000 records, checking some field values and using those to decide whether it needs to create a new record in 'mytable.' After doing some analysis I was able to determine, the process was no longer needed so could be killed.

Something that was making matters worse, there seemed to be 6 instances of this same process running at one time due to the way the cron job was configured and how long each one took! I killed off these processes, and incredibly our CPU usage fell from around 100% to around 5%!

MySQL Deadlock – Resolving Concurrent Connections in Amazon RDS

The short answer is no: I can't think of a difference from MySQL defaults that would cause this.

The longer answer is that it is possible to reduce some locking with READ-COMMITTED as the isolation level + Row-based Replication. To which RDS does not support switching to Row-based Replication :(

Whether or not this reduction in locking will help you depends on your schema. See my blog post here under "Write scalability of certain statements".

Best Answer

Related Solutions

Mysql – Struggling to debug high CPU usage on Amazon RDS MySQL instance

MySQL Deadlock – Resolving Concurrent Connections in Amazon RDS

Related Question