Amazon RDS – What Consumes Storage Space Beyond Database Files?

amazon-rdsdisk-space

I recently right-sized the data and log files for an RDS database, leaving just a few GB of unallocated space. Shortly thereafter, something allocated, then deallocated, 1 GB of the margin of error. What might have caused this?

All of the databases – including tempdb – are either tiny (MB) or at exactly the size they've been set; they add up to 197 GB, leaving 3 GB free on a 200 GB instance. And yet:

Clearly, if this happens again in a slightly greater magnitude, there will be a problem. For now, I can simply shrink one of the database files to give a larger margin, but I'd like to understand what actually happened.

Best Answer

AWS support came back with a comprehensive answer. In this case, it was transaction log backups, being written to the SQL disk before offloading to S3. There are a few more things which can consume space, mostly temporarily. As their tech outlined:

As a best Practice, I would recommend to keep at around 5% free space for similar processes (Log Backups, Error Log Files, Trace Files, etc), however this may vary based on high transactional volume or bulk data upload process , which may need high disk storage temporarily.

Related Solutions

Database Comparison – Local Database vs Amazon RDS

You should fully tune the MySQL Environment, particularly your InnoDB settings. (See the bottom of my Answer for tuning tips). This would be much better than fighting Amazon for elbow room in RAM/Disk. Why did I say fight?

If you just spun up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects

max_connections
innodb_buffer_pool_size

Here is a Chart I posted a Week Ago : https://dba.stackexchange.com/a/21498/877

MODEL      max_connections innodb_buffer_pool_size
---------  --------------- -----------------------
t1.micro   34                326107136 (  311M)
m1-small   125              1179648000 ( 1125M,  1.097G)
m1-large   623              5882511360 ( 5610M,  5.479G)
m1-xlarge  1263            11922309120 (11370M, 11.103G)
m2-xlarge  1441            13605273600 (12975M, 12.671G)
m2-2xlarge 2900            27367833600 (26100M, 25.488G)
m2-4xlarge 5816            54892953600 (52350M, 51.123G)

The larger the model, the bigger the InnoDB Buffer Pool. Oh yeah, the bigger the budget needs to be.

What most people are unaware of is that the InnoDB Transaction Log Files are the same size for all models : 128M. Given default settings (innodb_log_files_in_group = 2), that's 256M of Transaction Log Space. If you wish to set some other InnoDB option, do this:

Create a Custom DB Parameter Group (call it MySettings)
Download RDS CLI and setup a config file with your AWS Credentials
Execute the following : ./rds-modify-db-parameter-group MySettings --parameters "name=<InnoDB Option>,value=???,method=immediate"
Modify using DB Parameter Option List MySettings
Restart the MySQL RDS Instance

That's for your future consideration.

As for your import problem at hand, the AWS RDS Documentation has many suggestions for circumventing import slowness including:

Create all secondary indexes prior to loading. This is counterintuitive for those familiar with other databases. Adding or modifying a secondary index causes MySQL to create a new table with the index changes, copy the data from the existing table to the new table, and drop the original table.
Load data in PK order. This is particularly helpful for InnoDB tables where load times can be reduced by 75-80% and datafile size cut in half.
Disable foreign key constraints foreign_key_checks=0 For flat files loaded with LOAD DATA INFILE, this is required in many cases. For any load, disabling FK checks will provide significant performance gains. Just be sure to enable the constraints and verify the data after the load.
Load in parallel unless already near a resource limit. Use partitioned tables when appropriate.
Use multi-value inserts when loading with SQL to minimize statement execution overhead. When using mysqldump, this is done automatically.
Reduce InnoDB log IO innodb_flush_log_at_trx_commit=0

I hope you are not using LOAD DATA INFILE for importing. Why ? Because, only MyISAM tables benefit from it in conjunction with the bulk insert buffer (which by default is 8M). InnoDB does not benefit from it.

If you are loading InnoDB tables via mysqldump or your own SQL, all well and good. You may need to expand the Transaction Logs as I explained if your are imposing large transactions in bulk.

As a sidenote, please do not perform any large transactions if AWS is doing a snapshot. You should make sure snapshots are properly scheduled and then commit not to do any large transactions within the snapshot window. You could disable Snapshots for the duration of your bulk transaction, but the AWS RDS Documentation has a big caveat on that:

Warning: DO NOT DISABLE AUTOMATED BACKUPS IF YOU NEED TO RETAIN THE ABILITY TO PERFORM POINT-IN-TIME RECOVERY. Disabling automated backups erases all existing backups, so point-in-time recovery will not be possible after automated backups have been disabled. Disabling automated backups is a performance optimization and is not required for data loads. Note that DB Snapshots are not affected by disabling automated backups. All existing DB Snapshots are still available for restore.

Disabling automated backups will reduce load time by about 25% and reduce the amount of storage space required during the load. If you will be loading data into a new DB Instance that contains no data, disabling backups is an easy way to speed up the load and avoid using the additional storage needed for backups. However, if you will be loading into a DB Instance that already contains data; you must weigh the benefits of disabling backups against the impact of losing the ability to perform point-in-time-recovery.

DB Instances have automated backups enabled by default (with a one day retention period). In order to disable automated backups, you must set the backup retention period to zero. After the load, you can re-enable backups by setting the backup retention period to a non-zero value. In order to enable or disable backups, Amazon RDS must shut the DB Instance down and restart it in order to turn MySQL logging on or off.

Use the rds-modify-db-instance command to set the backup retention to zero and apply the change immediately. Setting the retention period to zero requires a DB Instance restart, so wait until the restart has completed before proceeding.

rds-modify-db-instance AcmeRDS --apply-immediately --backup-retention-period=0 You can check the status of your DB Instance with the rds-describe-db-instances command. The example displays the status of the AcmeRDS database instance and includes the --headers option to show column headings.

EPILOGUE

Here are my past posts on tuning InnoDB:

Mysql – Too Many database connections on Amazon RDS

Here is a Stored Procedure to kill long running SELECTs

DELIMITER $$

DROP PROCEDURE IF EXISTS `test`.`Kill_Long_Running_Selects` $$
CREATE PROCEDURE `test`.`Kill_Long_Running_Selects` (time_limit INT,display INT)
BEGIN

    DECLARE ndx,lastndx INT;

    DROP TABLE IF EXISTS test.LongRunningSelects;
    CREATE TABLE test.LongRunningSelects
    (
        id INT NOT NULL AUTO_INCREMENT,
        idtokill BIGINT,
        PRIMARY KEY (id)
    ) ENGINE=MEMORY;
    INSERT INTO test.LongRunningSelects (idtokill)
    SELECT id FROM information_schema.processlist
    WHERE user<>'system user' AND info regexp '^SELECT' AND time > time_limit;

    SELECT COUNT(1) INTO lastndx FROM test.LongRunningSelects;
    SET ndx = 0;
    WHILE ndx < lastndx DO
        SET ndx = ndx + 1;
        SELECT idtokill INTO @kill_id
        FROM test.LongRunningSelects WHERE id = ndx;
        CALL mysql.rds_kill(@kill_id);
    END WHILE;

    IF lastndx > 0 THEN
        IF display = 1 THEN
            SELECT GROUP_CONCAT(idtokill) INTO @idlist FROM test.LongRunningSelects;
            SELECT @idlist IDs_KIlled;
            SELECT CONCAT('Processes Killed : ',lastndx) Kill_Long_Running_Selects;
        END IF;
    END IF;

END $$

To kill SELECTs running longer than 30 seconds, you run this

CALL test.Kill_Long_Running_Selects(30,0);

If you want to see the connections being killed, you run this

CALL test.Kill_Long_Running_Selects(30,1);

Perhaps you can create a MySQL Event to call this Stored Procedure every minute.

If Amazon does not let you have the EVENT privilege, you will have to write an external shell script on the EC2 server to connect to the DB and run the Stored Procedure. That shell script can be put into a crontab.

If Amazon does not let you have the PROCESS and SUPER privileges, you may need to move the DB out of RDS and into another EC2 instance running MySQL to accomplish this. You could then create the MySQL Event without Amazon's hosting restrictions.

Best Answer

Related Solutions

Database Comparison – Local Database vs Amazon RDS

EPILOGUE

Mysql – Too Many database connections on Amazon RDS

Related Question