Mysql – “copy to tmp table” for 24 hours, then timeout; when ALTERing to add PK & AI

MySQLperformancequery-performance

I downloaded the Wiktionary XML dump, and ran mwdumper.jar to insert the data into MySQL.

Following the advice in the manual, I commented out the PRIMARY KEY AUTO_INCREMENT part of the CREATE TABLE statements, for page, revision and text (which have millions of rows in each, and very large BLOB's); in order to aid in a much faster import time.

After the data had been inserted, I ran the CREATE INDEX statements, and these executed fine (in ~30 seconds each).

My problem is, I'm now trying to add the PK's in, and just on the first line, it ran for 86,400 seconds (24 hours), then timed out!

SHOW FULL PROCESSLIST shows State: copy to tmp table.

Here are the queries I'm trying to run – it's stuck on the first one:

ALTER TABLE revision MODIFY COLUMN rev_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE text MODIFY COLUMN old_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE page MODIFY COLUMN page_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT;

When I ran SHOW GLOBAL STATUS while the query was running, the Handler_read_rnd_next variable was increasing, and there were no other connected users or applications running on the db, to otherwise influence that var.

How can I speed this query up? It's running on a quad-core 3.5GHz development machine with 16GB of RAM, but it's barely using any of this power.

The only options that have been changed from the default install are:

max_allowed_packet = 1G
innodb_log_file_size = 100M
innodb_log_buffer_size = 100M

And in MySQL Workbench:

Any advice is appreciated!

Best Answer

Well, there's nothing to be done at this stage. You can't terminate the process (rather, you shouldn't) while running. You cannot modify major memory settings without restarting MySQL.

I, for one, am not enthusiast about loading all table data then loading indexes. However, doing this for the PRIMARY KEY is plain wrong. The advice you read related to secondary indexes. With adding a PRIMARY KEY InnoDB will rebuild your entire table all over again; so I'm afraid your entire first load step was redundant.

Nevertheless 7 hours for PK compared to 30 seconds for secondary index does not make sense. What is your innodb_buffer_pool_size? Which disks? RAID? Which file system?