First, you need to know what you are doing to InnoDB when you plow millions of rows into an InnoDB table. Let's take a look at the InnoDB Architecture.
![InnoDB Architecture](https://i.stack.imgur.com/X7UrX.jpg)
In the upper left corner, there is an illustration of the InnoDB Buffer Pool. Notice there is a section of it dedicated to the insert buffer. What does that do ? It is ised to migrate changes to secondary indexes from the Buffer Pool to the Insert Buffer inside the system tablespace (a.k.a. ibdata1). By default, innodb_change_buffer_max_size is set to 25. This means that up to 25% of the Buffer Pool can be used for processing secondary indexes.
In your case, you have 6.935 GB for the InnoDB Buffer Pool. A maximum of 1.734 GB will be used for processing your secondary indexes.
Now, look at your table. You have 13 secondary indexes. Each row you process must generate a secondary index entry, couple it with the primary key of the row, and send them as a pair from the Insert Buffer in the Buffer Pool into the Insert Buffer in ibdata1. That happens 13 times with each row. Multiply this by 10 million and you can almost feel a bottleneck coming.
Don't forget that importing 10 million rows in a single transaction will pile up everything into one rollback segment and fill up the UNDO space in ibdata1.
SUGGESTIONS
SUGGESTION #1
My first suggestion for importing this rather large table would be
- Drop all the non-unique indexes
- Import the data
- Create all the non-unique indexes
SUGGESTION #2
Get rid of duplicate indexes. In your case, you have
KEY `party_id` (`party_id`),
KEY `party_id_2` (`party_id`,`status`)
Both indexes start with party_id
, you can increase secondary index processing by at least 7.6 % getting rid one index out of 13. You need to eventually run
ALTER TABLE monster DROP INDEX party_id;
SUGGESTION #3
Get rid of indexes you do not use. Look over your application code and see if your queries use all the indexes. You may want to look into pt-index-usage to let it suggest what indexes are not being used.
SUGGESTION #4
You should increase the innodb_log_buffer_size to 64M since the default is 8M. A bigger log buffer may increase InnoDB write I/O performance.
EPILOGUE
Putting the first two suggestions in place, do the following:
- Drop the 13 non-unique indexes
- Import the data
- Create all the non-unique indexes except the
party_id
index
Perhaps the following may help
CREATE TABLE monster_new LIKE monster;
ALTER TABLE monster_new
DROP INDEX `party_id`,
DROP INDEX `creation_date`,
DROP INDEX `email`,
DROP INDEX `hash`,
DROP INDEX `address_hash`,
DROP INDEX `thumbs3`,
DROP INDEX `ext_monster_id`,
DROP INDEX `status`,
DROP INDEX `note`,
DROP INDEX `postcode`,
DROP INDEX `some_id`,
DROP INDEX `cookie`,
DROP INDEX `party_id_2`;
ALTER TABLE monster RENAME monster_old;
ALTER TABLE monster_new RENAME monster;
Import the data into monster
. Then, run this
ALTER TABLE monster
ADD INDEX `creation_date`,
ADD INDEX `email` (`email`(4)),
ADD INDEX `hash` (`hash`(8)),
ADD INDEX `address_hash` (`address_hash`(8)),
ADD INDEX `thumbs3` (`thumbs3`),
ADD INDEX `ext_monster_id` (`ext_monster_id`),
ADD INDEX `status` (`status`),
ADD INDEX `note` (`note`(4)),
ADD INDEX `postcode` (`postcode`),
ADD INDEX `some_id` (`some_id`),
ADD INDEX `cookie` (`cookie`),
ADD INDEX `party_id_2` (`party_id`,`status`);
GIVE IT A TRY !!!
ALTERNATIVE
You could create a table called monster_csv
as a MyISAM table with no indexes and do this:
CREATE TABLE monster_csv ENGINE=MyISAM AS SELECT * FROM monster WHERE 1=2;
ALTER TABLE monster RENAME monster_old;
CREATE TABLE monster LIKE monster_old;
ALTER TABLE monster DROP INDEX `party_id`;
Import your data into monster_csv
. Then, use mysqldump to create another import
mysqldump -t -uroot -p mydb monster_csv | sed 's/monster_csv/monster/g' > data.sql
The mysqldump file data.sql
will extended INSERT commands importing 10,000-20,000 rows at a time.
Now, just load the mysqldump
mysql -uroot -p mydb < data.sql
Finally, get rid of the MyISAM table
DROP TABLE monster_csv;
That's true.
In 5.1 (and 5.5), adding a column (of any type) to a table required locking the table and copying the entire table over, plus rebuilding all the indexes. 5.6 improves significantly on the task -- many ALTERs
can be done with little or no interruption of other activity. (The time taken varies with the task.)
You should plan for upgrading to 5.5, 5.6, or even 5.7. The farther behind you get, the harder it is to upgrade.
innodb-buffer-pool-size = 14G
is dangerously high for a 16GB machine. 11G would be safer. If MySQL "swaps", performance becomes terrible.
The table has an awful lot of indexes.
Are you familiar with "composite" indexes?
Best Answer
Just in case someone else comes looking for the same type of information. I decided that I'd like to import only a subset of wikipedia pages (the most popular subset) and built a couple of tools that build the list of most popular wikipedia pages (from the monthly logs) and filter the sqldump data files before import.