You should fully tune the MySQL Environment, particularly your InnoDB settings. (See the bottom of my Answer for tuning tips). This would be much better than fighting Amazon for elbow room in RAM/Disk. Why did I say fight?
If you just spun up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects
- max_connections
- innodb_buffer_pool_size
Here is a Chart I posted a Week Ago : https://dba.stackexchange.com/a/21498/877
MODEL max_connections innodb_buffer_pool_size
--------- --------------- -----------------------
t1.micro 34 326107136 ( 311M)
m1-small 125 1179648000 ( 1125M, 1.097G)
m1-large 623 5882511360 ( 5610M, 5.479G)
m1-xlarge 1263 11922309120 (11370M, 11.103G)
m2-xlarge 1441 13605273600 (12975M, 12.671G)
m2-2xlarge 2900 27367833600 (26100M, 25.488G)
m2-4xlarge 5816 54892953600 (52350M, 51.123G)
The larger the model, the bigger the InnoDB Buffer Pool. Oh yeah, the bigger the budget needs to be.
What most people are unaware of is that the InnoDB Transaction Log Files are the same size for all models : 128M. Given default settings (innodb_log_files_in_group = 2), that's 256M of Transaction Log Space. If you wish to set some other InnoDB option, do this:
- Create a Custom DB Parameter Group (call it
MySettings
)
- Download RDS CLI and setup a config file with your AWS Credentials
- Execute the following :
./rds-modify-db-parameter-group MySettings --parameters "name=<InnoDB Option>,value=???,method=immediate"
- Modify using DB Parameter Option List
MySettings
- Restart the MySQL RDS Instance
That's for your future consideration.
As for your import problem at hand, the AWS RDS Documentation has many suggestions for circumventing import slowness including:
- Create all secondary indexes prior to loading. This is counterintuitive for those familiar with other databases. Adding or modifying a secondary index causes MySQL to create a new table with the index changes, copy the data from the existing table to the new table, and drop the original table.
- Load data in PK order. This is particularly helpful for InnoDB tables where load times can be reduced by 75-80% and datafile size cut in half.
- Disable foreign key constraints foreign_key_checks=0 For flat files loaded with LOAD DATA INFILE, this is required in many cases. For any load, disabling FK checks will provide significant performance gains. Just be sure to enable the constraints and verify the data after the load.
- Load in parallel unless already near a resource limit. Use partitioned tables when appropriate.
- Use multi-value inserts when loading with SQL to minimize statement execution overhead. When using mysqldump, this is done automatically.
- Reduce InnoDB log IO innodb_flush_log_at_trx_commit=0
I hope you are not using LOAD DATA INFILE
for importing. Why ? Because, only MyISAM tables benefit from it in conjunction with the bulk insert buffer (which by default is 8M). InnoDB does not benefit from it.
If you are loading InnoDB tables via mysqldump or your own SQL, all well and good. You may need to expand the Transaction Logs as I explained if your are imposing large transactions in bulk.
As a sidenote, please do not perform any large transactions if AWS is doing a snapshot. You should make sure snapshots are properly scheduled and then commit not to do any large transactions within the snapshot window. You could disable Snapshots for the duration of your bulk transaction, but the AWS RDS Documentation has a big caveat on that:
Warning: DO NOT DISABLE AUTOMATED BACKUPS IF YOU NEED TO RETAIN THE
ABILITY TO PERFORM POINT-IN-TIME RECOVERY. Disabling automated backups
erases all existing backups, so point-in-time recovery will not be
possible after automated backups have been disabled. Disabling
automated backups is a performance optimization and is not required
for data loads. Note that DB Snapshots are not affected by disabling
automated backups. All existing DB Snapshots are still available for
restore.
Disabling automated backups will reduce load time by about 25% and
reduce the amount of storage space required during the load. If you
will be loading data into a new DB Instance that contains no data,
disabling backups is an easy way to speed up the load and avoid using
the additional storage needed for backups. However, if you will be
loading into a DB Instance that already contains data; you must weigh
the benefits of disabling backups against the impact of losing the
ability to perform point-in-time-recovery.
DB Instances have automated backups enabled by default (with a one day
retention period). In order to disable automated backups, you must set
the backup retention period to zero. After the load, you can re-enable
backups by setting the backup retention period to a non-zero value. In
order to enable or disable backups, Amazon RDS must shut the DB
Instance down and restart it in order to turn MySQL logging on or off.
Use the rds-modify-db-instance command to set the backup retention to
zero and apply the change immediately. Setting the retention period to
zero requires a DB Instance restart, so wait until the restart has
completed before proceeding.
rds-modify-db-instance AcmeRDS --apply-immediately
--backup-retention-period=0 You can check the status of your DB Instance with the rds-describe-db-instances command. The example
displays the status of the AcmeRDS database instance and includes the
--headers option to show column headings.
EPILOGUE
Here are my past posts on tuning InnoDB:
Here is a Stored Procedure to kill long running SELECTs
DELIMITER $$
DROP PROCEDURE IF EXISTS `test`.`Kill_Long_Running_Selects` $$
CREATE PROCEDURE `test`.`Kill_Long_Running_Selects` (time_limit INT,display INT)
BEGIN
DECLARE ndx,lastndx INT;
DROP TABLE IF EXISTS test.LongRunningSelects;
CREATE TABLE test.LongRunningSelects
(
id INT NOT NULL AUTO_INCREMENT,
idtokill BIGINT,
PRIMARY KEY (id)
) ENGINE=MEMORY;
INSERT INTO test.LongRunningSelects (idtokill)
SELECT id FROM information_schema.processlist
WHERE user<>'system user' AND info regexp '^SELECT' AND time > time_limit;
SELECT COUNT(1) INTO lastndx FROM test.LongRunningSelects;
SET ndx = 0;
WHILE ndx < lastndx DO
SET ndx = ndx + 1;
SELECT idtokill INTO @kill_id
FROM test.LongRunningSelects WHERE id = ndx;
CALL mysql.rds_kill(@kill_id);
END WHILE;
IF lastndx > 0 THEN
IF display = 1 THEN
SELECT GROUP_CONCAT(idtokill) INTO @idlist FROM test.LongRunningSelects;
SELECT @idlist IDs_KIlled;
SELECT CONCAT('Processes Killed : ',lastndx) Kill_Long_Running_Selects;
END IF;
END IF;
END $$
To kill SELECTs running longer than 30 seconds, you run this
CALL test.Kill_Long_Running_Selects(30,0);
If you want to see the connections being killed, you run this
CALL test.Kill_Long_Running_Selects(30,1);
Perhaps you can create a MySQL Event to call this Stored Procedure every minute.
If Amazon does not let you have the EVENT privilege, you will have to write an external shell script on the EC2 server to connect to the DB and run the Stored Procedure. That shell script can be put into a crontab.
If Amazon does not let you have the PROCESS and SUPER privileges, you may need to move the DB out of RDS and into another EC2 instance running MySQL to accomplish this. You could then create the MySQL Event without Amazon's hosting restrictions.
Best Answer
AWS support came back with a comprehensive answer. In this case, it was transaction log backups, being written to the SQL disk before offloading to S3. There are a few more things which can consume space, mostly temporarily. As their tech outlined: