Mysql – Adding index to large thesql tables

indexindex-tuningMySQL

I have a table

| base_schedule_line_items | CREATE TABLE base_schedule_line_items (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
installment int(10) unsigned NOT NULL,
on_date date NOT NULL,
actual_date date DEFAULT NULL,
payment_type int(11) NOT NULL,
scheduled_principal_outstanding decimal(65,0) NOT NULL,
scheduled_principal_due decimal(65,0) NOT NULL,
scheduled_interest_outstanding decimal(65,0) NOT NULL,
scheduled_interest_due decimal(65,0) NOT NULL,
currency int(11) NOT NULL,
updated_at datetime NOT NULL DEFAULT '2013-01-06 14:29:16',
created_at datetime NOT NULL DEFAULT '2013-01-06 14:29:16',
loan_base_schedule_id int(10) unsigned NOT NULL,
lending_id int(10) unsigned NOT NULL,
reschedule tinyint(1) DEFAULT '0',
PRIMARY KEY (id),
KEY index_base_schedule_line_items_loan_base_schedule (loan_base_schedule_id),
KEY index_bslt_spd (scheduled_principal_due),
KEY index_bslt_lending (lending_id),
KEY index_bslt_actualdate (actual_date),
KEY index_bslt_spsila (loan_base_schedule_id,scheduled_principal_due,scheduled_interest_due,actual_date),
KEY index_bslt_ondate (on_date),
KEY index_bslt_oa (on_date,actual_date),
KEY index_bslt_ol (on_date,loan_base_schedule_id),
KEY index_bslt_oli (on_date,lending_id)
) ENGINE=InnoDB AUTO_INCREMENT=30410126 DEFAULT CHARSET=utf8 |

Now this table has 30 million records in it, I need to add two more indexes to this
and its like it takes years to add it.

alter table base_schedule_line_items add index index_bslt_sla (scheduled_principal_due,actual_date,lending_id);
alter table base_schedule_line_items add index index_bslt_ssla(scheduled_principal_due,scheduled_interest_due,lending_id,actual_date);

I used the below mentioned query to find out size of table

SELECT table_name AS "Tables", round(((data_length + index_length) / 1024 / 1024), 2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "my_database_name";

The result came out as

base_schedule_line_items | 20111.00

I used this just to calculate data length and left out index length

SELECT table_name AS "Tables", round(((data_length) / 1024 / 1024), 2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "my_database_name";

and result was

base_schedule_line_items | 9497.00

The indexes

KEY index_bslt_actualdate (actual_date),
KEY index_bslt_spsila (loan_base_schedule_id,scheduled_principal_due,scheduled_interest_due,actual_date),
KEY index_bslt_ondate (on_date),
KEY index_bslt_oa (on_date,actual_date),
KEY index_bslt_ol (on_date,loan_base_schedule_id),
KEY index_bslt_oli (on_date,lending_id)

were added by me, but I am not fsure or what purpose other indices were added for. This is fairly big application . Now i need to add those two indices mentioned above as they help me extracting a report using a select statement and I find it very difficult to add them. Any help would be greatly appreciated

Best Answer

The activity of altering big tables are done in phases:

  1. Create a new table with required fields and indexes say in test DB (just structure)
  2. Dump the data from the existing table and load the same to the newly created table in test DB
  3. Now announce your downtime :)
  4. Swap the tables by renaming - RENAME table ur_db.table_name to test.temp, test.table_name to ur_db.table_name, test.temp to test.table_name; This is an atomic operation taking fraction of a second.
  5. Load the extra records to the newly created table ( the records that came after dumping and then loading). This step can be done before Step: 3 also once to reduce your downtime.
  6. And your system is back

Few notes:

  1. You need not hit information schema directly like this, try using SHOW TABLE STATUS from db like 'table_name'
  2. The speed of alter table is more or less linked with the I/O speed. Last time when we ran a direct alter table(without the above steps), we had 40GB+ table size it took around 4hours. If your 20GB data is taking years, you are working on some outdated machine.
  3. Also drop of the unwanted indexes like index_bslt_ondate, index_base_schedule_line_items_loan_base_schedule as some other indexes has the left most column as the indexed column

Please let me know if you need any clarification on any of these steps.

Edit : A simple python script to automate the process https://github.com/georgecj11/hotswap_mysql_table.git