Database Comparison – Local Database vs Amazon RDS

amazon-rdsdatabase-recommendationMySQL

I am doing large INSERT into a local MySQL database

I am at 2,000,000 rows and am really starting to notice the slowdown now. I always heard that MySQL wasn't very scalable

I expect this database to get larger and I have a lot more to insert

would using an Amazon micro instance have any benefit here?

I don't foresee my upload speed being a bottleneck, so my logic is that their distributed processing will make the database quicker.

This turned out to be more of a CPU utilization issue but would amazon's solution be better, perhaps more optimized for its more available processors? With this logic a more expensive instance would have better performance

but with so many variables involved here, does anyone have experience with this to say?

Best Answer

You should fully tune the MySQL Environment, particularly your InnoDB settings. (See the bottom of my Answer for tuning tips). This would be much better than fighting Amazon for elbow room in RAM/Disk. Why did I say fight?

If you just spun up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects

  • max_connections
  • innodb_buffer_pool_size

Here is a Chart I posted a Week Ago : https://dba.stackexchange.com/a/21498/877

MODEL      max_connections innodb_buffer_pool_size
---------  --------------- -----------------------
t1.micro   34                326107136 (  311M)
m1-small   125              1179648000 ( 1125M,  1.097G)
m1-large   623              5882511360 ( 5610M,  5.479G)
m1-xlarge  1263            11922309120 (11370M, 11.103G)
m2-xlarge  1441            13605273600 (12975M, 12.671G)
m2-2xlarge 2900            27367833600 (26100M, 25.488G)
m2-4xlarge 5816            54892953600 (52350M, 51.123G)

The larger the model, the bigger the InnoDB Buffer Pool. Oh yeah, the bigger the budget needs to be.

What most people are unaware of is that the InnoDB Transaction Log Files are the same size for all models : 128M. Given default settings (innodb_log_files_in_group = 2), that's 256M of Transaction Log Space. If you wish to set some other InnoDB option, do this:

  • Create a Custom DB Parameter Group (call it MySettings)
  • Download RDS CLI and setup a config file with your AWS Credentials
  • Execute the following : ./rds-modify-db-parameter-group MySettings --parameters "name=<InnoDB Option>,value=???,method=immediate"
  • Modify using DB Parameter Option List MySettings
  • Restart the MySQL RDS Instance

That's for your future consideration.

As for your import problem at hand, the AWS RDS Documentation has many suggestions for circumventing import slowness including:

  • Create all secondary indexes prior to loading. This is counterintuitive for those familiar with other databases. Adding or modifying a secondary index causes MySQL to create a new table with the index changes, copy the data from the existing table to the new table, and drop the original table.
  • Load data in PK order. This is particularly helpful for InnoDB tables where load times can be reduced by 75-80% and datafile size cut in half.
  • Disable foreign key constraints foreign_key_checks=0 For flat files loaded with LOAD DATA INFILE, this is required in many cases. For any load, disabling FK checks will provide significant performance gains. Just be sure to enable the constraints and verify the data after the load.
  • Load in parallel unless already near a resource limit. Use partitioned tables when appropriate.
  • Use multi-value inserts when loading with SQL to minimize statement execution overhead. When using mysqldump, this is done automatically.
  • Reduce InnoDB log IO innodb_flush_log_at_trx_commit=0

I hope you are not using LOAD DATA INFILE for importing. Why ? Because, only MyISAM tables benefit from it in conjunction with the bulk insert buffer (which by default is 8M). InnoDB does not benefit from it.

If you are loading InnoDB tables via mysqldump or your own SQL, all well and good. You may need to expand the Transaction Logs as I explained if your are imposing large transactions in bulk.

As a sidenote, please do not perform any large transactions if AWS is doing a snapshot. You should make sure snapshots are properly scheduled and then commit not to do any large transactions within the snapshot window. You could disable Snapshots for the duration of your bulk transaction, but the AWS RDS Documentation has a big caveat on that:

Warning: DO NOT DISABLE AUTOMATED BACKUPS IF YOU NEED TO RETAIN THE ABILITY TO PERFORM POINT-IN-TIME RECOVERY. Disabling automated backups erases all existing backups, so point-in-time recovery will not be possible after automated backups have been disabled. Disabling automated backups is a performance optimization and is not required for data loads. Note that DB Snapshots are not affected by disabling automated backups. All existing DB Snapshots are still available for restore.

Disabling automated backups will reduce load time by about 25% and reduce the amount of storage space required during the load. If you will be loading data into a new DB Instance that contains no data, disabling backups is an easy way to speed up the load and avoid using the additional storage needed for backups. However, if you will be loading into a DB Instance that already contains data; you must weigh the benefits of disabling backups against the impact of losing the ability to perform point-in-time-recovery.

DB Instances have automated backups enabled by default (with a one day retention period). In order to disable automated backups, you must set the backup retention period to zero. After the load, you can re-enable backups by setting the backup retention period to a non-zero value. In order to enable or disable backups, Amazon RDS must shut the DB Instance down and restart it in order to turn MySQL logging on or off.

Use the rds-modify-db-instance command to set the backup retention to zero and apply the change immediately. Setting the retention period to zero requires a DB Instance restart, so wait until the restart has completed before proceeding.

rds-modify-db-instance AcmeRDS --apply-immediately --backup-retention-period=0 You can check the status of your DB Instance with the rds-describe-db-instances command. The example displays the status of the AcmeRDS database instance and includes the --headers option to show column headings.

EPILOGUE

Here are my past posts on tuning InnoDB: