Innodb – Fastest way to copy data from MyISAM to InnoDB

importinnodbinsertmyisamperformance

I want to copy all data from a MyISAM table with 16 millions rows to a InnoDB table, that will have a different (optimized) schema. See tables below to see the changes.

However, every approach I try leads to a slow result.

#1 INSERT

Using a simple INSERT with VALUES from a SELECT:

INSERT INTO new_table (field1,field2,...) SELECT * FROM old_table;

Really slow and it locks the original table, which I can't have locked for much longer.

#2 IMPORT FROM SQL FILE

Creating the SQL file takes minutes, but ok. Importing the file is really slow.

#3 LOAD DATA

I dump all the data to a comma separate file, which takes about 30 seconds (quite good), and then I load it with LOAD DATA INFILE. But still very slow.

Why not drop unique keys before loading the data?

Yes, it loads faster. However, when I try to add the unique keys I get this:

Lock wait timeout exceeded; try restarting transaction

Let's split the into several files?

I tried that. Each file contained 100,000 rows.

  • File 1 = 2.76 sec
  • File 2 = 1 min 40 sec
  • File 3 = 4 min 16 sec

I gave up on file 3. I'd have to do it 157 more times and on each iteration the processing time would increase. Too much time…

What is 'slow'

When I say it is slow, I mean, with unique keys off, it takes about 10 seconds to load 500,000 rows. No bad. But then I can't add unique keys because of the timeout.

With unique keys previously set, it takes almost 1 minute to insert 100,000 rows. And the throughput decreases the more rows is inserted.

The tables

MyISAM – Original table

+----------------------+-----------------------+------+-----+---------+-------+
| Field                | Type                  | Null | Key | Default | Extra |
+----------------------+-----------------------+------+-----+---------+-------+
| player_id            | int(11)               | NO   | PRI | NULL    |       |
| date                 | date                  | NO   | PRI | NULL    |       |
| time                 | int(2)                | NO   | PRI | NULL    |       |
| minutes_online       | decimal(5,0)          | NO   |     | 0       |       |
| minutes_playing      | decimal(5,0)          | NO   |     | 0       |       |
| minutes_chatting     | decimal(5,0)          | NO   |     | 0       |       |
| minutes_away         | decimal(5,0)          | NO   |     | 0       |       |
+----------------------+-----------------------+------+-----+---------+-------+

InnoDB – New table

+----------------------+-----------------------+------+-----+---------+-------+
| Field                | Type                  | Null | Key | Default | Extra |
+----------------------+-----------------------+------+-----+---------+-------+
| player_id            | mediumint(8) unsigned | NO   | PRI | NULL    |       |
| date                 | date                  | NO   | PRI | NULL    |       |
| time                 | tinyint(2)            | NO   | PRI | NULL    |       |
| minutes_online       | tinyint(2)            | NO   |     | 0       |       |
| minutes_playing      | tinyint(2)            | NO   |     | 0       |       |
| minutes_chatting     | tinyint(2)            | NO   |     | 0       |       |
| minutes_away         | tinyint(2)            | NO   |     | 0       |       |
+----------------------+-----------------------+------+-----+---------+-------+

Am I stuck on this? Meaning, will I have to accept the fact that it is slow and be patient? Or is there any way or tweak to load the data with an acceptable performance?

Solution
I split into small CSV files, applied the tweaks that Shlomi Noach suggested and then loaded each file.

Not the perfect. But it loaded a bit faster. Dumping the original table to a CSV file took less than 1 minute, which avoided the table from being locked during the whole process. A script did the job to load each file, which would be very boring to do manually.

In the end I just had to get from the original table the rows inserted/updated in the last hours and update on the new table.

Best Answer

There are a few solutions. First, however, I'm not sure about the process in the first place. Is this a one time copy operation? A recurring copy operation? Do you want to migrate from MyISAM to InnoDB?

What is the main reason for your desire for a quick operation?

If you're looking for migration, then why don't you use an online table alter tool, such as oak-online-alter-table (disclaimer: I'm author of this tool) or pt-online-schema-change? Both will allow you to change your schema live and online with very little disturbance.

If you're looking to a copy+paste of your data, then I would suggest using chunking: copying the data in small packets. This way you don't get that huge lock and no funny timeouts. You can use either oak-chunk-update or pt-archiver for this. This may actually make the total runtime shorter because of reduces locking, but may also take longer. Also consider that it is not an atomic operation, and changes to original table while copying is made, may not get caught, so you may get an inconsistent copy.

Otherwise (or in addition) you can use all the usual tweaks, such as

SET GLOBAL innodb_flush_log_at_trx_commit := 2;

or set

[mysqld]
innodb_doublewrite = 0

or perhaps, depending on OS and disks,

[mysqld]
innodb_flush_method = O_DIRECT

Each of the above may reduce disk I/O access. First two will also make your server less crash safe. But if for limited time, this may be OK for you.