I downloaded the Wiktionary XML dump, and ran mwdumper.jar
to insert the data into MySQL.
Following the advice in the manual, I commented out the PRIMARY KEY AUTO_INCREMENT
part of the CREATE TABLE
statements, for page
, revision
and text
(which have millions of rows in each, and very large BLOB
's); in order to aid in a much faster import time.
After the data had been inserted, I ran the CREATE INDEX
statements, and these executed fine (in ~30 seconds each).
My problem is, I'm now trying to add the PK's in, and just on the first line, it ran for 86,400 seconds (24 hours), then timed out!
SHOW FULL PROCESSLIST
shows State: copy to tmp table.
Here are the queries I'm trying to run – it's stuck on the first one:
ALTER TABLE revision MODIFY COLUMN rev_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE text MODIFY COLUMN old_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE page MODIFY COLUMN page_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT;
When I ran SHOW GLOBAL STATUS
while the query was running, the Handler_read_rnd_next
variable was increasing, and there were no other connected users or applications running on the db, to otherwise influence that var.
How can I speed this query up? It's running on a quad-core 3.5GHz development machine with 16GB of RAM, but it's barely using any of this power.
The only options that have been changed from the default install are:
max_allowed_packet = 1G
innodb_log_file_size = 100M
innodb_log_buffer_size = 100M
And in MySQL Workbench:
Any advice is appreciated!
Best Answer
Well, there's nothing to be done at this stage. You can't terminate the process (rather, you shouldn't) while running. You cannot modify major memory settings without restarting MySQL.
I, for one, am not enthusiast about loading all table data then loading indexes. However, doing this for the
PRIMARY KEY
is plain wrong. The advice you read related to secondary indexes. With adding aPRIMARY KEY
InnoDB will rebuild your entire table all over again; so I'm afraid your entire first load step was redundant.Nevertheless 7 hours for PK compared to 30 seconds for secondary index does not make sense. What is your
innodb_buffer_pool_size
? Which disks? RAID? Which file system?