MySQL – Why Does mysqldump –order-by-primary Help When Changing Table Engine from MyISAM to InnoDB?

innodbmyisamMySQLmysqldumporder-by

I found an interesting mysqldump option in MySQL 5.7 Reference Manual. In internet there are no further details and explanations.

--order-by-primary

Dump each table's rows sorted by its primary key, or by its first
unique index, if such an index exists. This is useful when dumping a
MyISAM table to be loaded into an InnoDB table, but makes the dump
operation take considerably longer.

Can you explain me why this helps in migrating MyISAM to InnoDB? How this works?

Best Answer

MyISAM table is stored as a heap, rows are being placed inside the data file into an empty place they can fit. If you are only inserting then that means appending new rows, but when the rows are being updated or deleted, some gaps are created and then later filled by other rows so the order of full table scan (which is what dump does by default) is not defined in any way.

InnoDB table is organized differently. The rows are stored as items inside a B-Tree "indexed" by the PRIMARY KEY. So entire table is clustered by the PRIMARY index defined. (As a side note InnoDB table always has a PK, if you do not define one, it picks one for you or creates artifical one.) That means that the full table scan is actually executed by scanning all data pages of the PRIMARY index and so the rows are returned ordered by that.

If you dump MyISAM table ordered by PRIMARY, the dump may be a bit slower than the unordered one, but a dump is usually the faster part of migrating so thats mostly a small penalty.

When you insert data into InnoDB table, the B-Tree index dynamically balances itself to keep the logarithmic depth. That often means some page splits are done.

If you insert the data in random order, each row may go to a different part of the tree, hitting different pages on-disk and initiating random splits. That incurs some IO performance penalty and it can lead to suboptimal fill factor (percentage of empty space in each page), as full page is split to two pages only 50% full which means that in bad case your data storage may be 2 times greater than needed (and all queries have to read twice as much data). Another possible outcome is fragmentation of the pages - when logically consecutive pages of the the lowest level of the B-Tree are stored in random separate locations of the disk drive (but this state is not easily identifiable and seems to be a theoretical issue, in practice encountered only rarely if ever).

But if you insert the data ordered by the PRIMARY KEY, the new rows are being "appended" to the table. Only the "last" pages on each level need to be taken care of (and if I understand it correctly - the suboptimal fill factor can be avoided as InnoDB can identify that "appending" is happening and add new pages more effectively). And the pages may be allocated more sequentially too.

So insertion in the PK order leads to faster execution and probably better storage usage in InnoDB tables.

RECOMMENDATION #1

Let's presume you have a fair amount of disk space on your cloud server. You create the InnoDB table in stages to watch the effect of diskspace and RAM in slow motion.

Let's say your MyISAM table is called mydb.mytable and you want to convert it to InnoDB.

Try the following:

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new ENGINE=InnoDB;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;

Play with the table.

If you like its performance, then

DROP TABLE mydb.mytable_old;

If you do not like its performance, then

TRUNCATE TABLE mydb.mytable_old;
INSERT INTO mydb.mytable_old SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable RENAME mydb.mytable_zap;
ALTER TABLE mydb.mytable_old RENAME mydb.mytable;
DROP TABLE mydb.mytable_zap;

RECOMMENDATION #2

You mentioned you are running MySQL 5.1. Unless you install the InnoDB Plugin, InnoDB is single threaded. To get InnoDB to be multithreaded:

Install MySQL 5.6
Install MySQL 5.1's InnoDB Plugin if you want to keep MySQL 5.1 (Applicable to MySQL 5.1.38+)

Once you do either one, please configure InnoDB to engage multiple cores. See my past posts:

May 26, 2011 : About single threaded versus multithreaded databases performance
Sep 12, 2011 : Possible to make MySQL use more than one core?
Sep 20, 2011 : Multi cores and MySQL Performance

Best Answer

Related Solutions

MySQL – Why MyISAM Supports FULLTEXT Searching and InnoDB Does Not

MySQL – Optimal Configs for ALTER of Large MyISAM Table to InnoDB

RECOMMENDATION #1

RECOMMENDATION #2

Related Question