Mysql – Why would the size of MySQL MyISAM table indexes (aka MYI file) not match after thesqldump import

importindexmyisamMySQLmysqldump

I've been archiving some MySQL (version 5.0.96) MyISAM tables locally and deleting from my server after checking the data. Every time the process has been pretty similar and comes off mostly without a hitch. In the case of using mysqldump for a single table here is my process…

  1. Create a gzipped dump of the table in question. (mysqldump -u root -p db_name table_name | gzip > table_name.sql.gz)
  2. Download to my local computer, reverse gzip.
  3. Import into the local MySQL (same version) using "source PATH". (note: online server is linux, local computer is Windows… hasn't made a difference so far though)
  4. Compare CHECKSUM TABLE database.table; of online version and local version.
  5. Compare row count, data size, and index size. (just to be sure checksum didn't fail me somehow)
  6. If everything checks out, drop online table.

Like I said, mostly, this has been smooth sailing and after I'm satisfied I drop the online version of the table. However, for the most recent table I've gotten hung up on step 5 and I'm at a loss as to why exactly. On my latest import everything in step 4 and 5 matches… except the index size. It's a mismatch to the tune of 2.5 GB!

I know what you're thinking…

  1. The base data is most likely fine then, proceed with drop.
  2. Your online indexes are fragmented.

I would agree, however, my testing doesn't fully illuminate this and I want to be sure I understand what is going on before I drop the online table. (and for future reference as well)

After every previous test the index size matched upon importing the dump. Granted, it doesn't matter if point 2 applies. However, to address point 2 I ran OPTIMIZE TABLE against the online version of the table and the index size did not change.

Regarding disk space in general… I had 75 GB free locally and about 59 GB free online at the time of testing. Either should be enough for importing or optimizing and I received no messages indicating a failure in either case.

I've hit an impasse on trying to explain this discrepancy to myself. Also, I should note I ran through this entire process twice to ensure I wasn't missing something obvious. So…

Why would the size of MySQL MyISAM table indexes (aka MYI file) not match after mysqldump import?

  1. Fragmentation (Ran OPTIMIZE TABLE, no luck syncing size)
  2. Corruption (Ran through process twice and row count, data size, checksum all match)
  3. …?

I can usually solve my own problems… consider me humbled. Any ideas are welcome.

PS I hope this makes for a good first question ;P

Local table after import:

+----------------+-------------+----------------+---------------+
| database.table | table_rows  | data_length    | index_length  |
+----------------+-------------+----------------+---------------+
| database.table | 145,250,083 | 23,385,263,363 | 5,853,283,328 |
+----------------+-------------+----------------+---------------+

Online source table:

+----------------+-------------+----------------+---------------+
| database.table | table_rows  | data_length    | index_length  |
+----------------+-------------+----------------+---------------+
| database.table | 145,250,083 | 23,385,263,363 | 8,199,018,496 |
+----------------+-------------+----------------+---------------+


mysql> OPTIMIZE TABLE database.table;

+-------------------------------------+----------+----------+----------+
| Table                               | Op       | Msg_type | Msg_text |
+-------------------------------------+----------+----------+----------+
| database.table                      | optimize | status   | OK       |
+-------------------------------------+----------+----------+----------+
1 row in set (17 min 41.92 sec)

mysql> SELECT CONCAT(table_schema, '.', table_name) AS 'database.table',
    -> FORMAT(table_rows, 0) AS table_rows,
    -> FORMAT(data_length, 0) AS data_length,
    -> FORMAT(index_length, 0) AS index_length
    -> FROM information_schema.TABLES
    -> WHERE table_name = 'table';

+----------------+-------------+----------------+---------------+
| database.table | table_rows  | data_length    | index_length  |
+----------------+-------------+----------------+---------------+
| database.table | 145,250,083 | 23,385,263,363 | 8,199,018,496 |
+----------------+-------------+----------------+---------------+

The table schema is below. I ran SHOW CREATE TABLE both online and locally and then put the results in two Notepad++ panes and ran "Compare". It returned as an exact match. I've obfuscated some column names, but the rest of the column (and other) definitions have remained the same.

CREATE TABLE `generic_dummy_table_name` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `user` char(22) NOT NULL,
  `num_column_1` mediumint(8) unsigned default '0',
  `num_column_2` int(10) unsigned default '0',
  `num_column_3` int(10) unsigned default '0',
  `num_column_4` mediumint(8) unsigned default '0',
  `num_column_5` mediumint(8) unsigned default '0',
  `num_column_6` mediumint(8) unsigned default '0',
  `num_column_7` mediumint(8) unsigned default '0',
  `num_column_8` bigint(11) unsigned default '0',
  `datetime_1` datetime NOT NULL default '0000-00-00 00:00:00',
  `datetime_1_actual` char(29) default NULL,
  `datetime_2` datetime NOT NULL default '0000-00-00 00:00:00',
  `datetime_2_actual` char(29) default NULL,
  `datetime_3` datetime NOT NULL default '0000-00-00 00:00:00',
  `datetime_4` datetime NOT NULL default '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `created_at` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY (`user`,`datetime_4`),
  KEY (`datetime_4`)
) ENGINE=MyISAM AUTO_INCREMENT=1257737641 DEFAULT CHARSET=latin1

Note: The auto increment value might seem odd versus row count, but worry not… that's because this is not the first time I've archived this table.

Best Answer

I think I have a plausible explanation you find very intriguing.

When you loaded the MyISAM table, only one index got loaded into the .MYI file in a lopsided manner.

Which index would that be ? THE PRIMARY KEY. Why ?

Indexes usually use BTREEs. They are designed to collect keys into a single BTREE node and balance the BTREE internally until the BTREE is full. On the insert into the full node, the full tree node is split and the keys inside are divided.

Most often a BTREE node will split is when you load the keys in order. For proof, look at the worst case BTREE --- the balanced binary tree.

I have several posts in the DBA STackExchange where I mention how a balanced binary only has node with one key. I also mention how a leaf pages must rebalance about 45% of the time when inserting a new key. Here are those posts:

My working theory is that random inserts into the table may actually make the .MYI smaller by prevent tree node splits as much as possible.

In order to make this happen, you will have to something quite unusual. Ready for this ?

ALTER TABLE generic_dummy_table_name ORDER BY datetime;

or

ALTER TABLE generic_dummy_table_name ORDER BY user,datetime;

That right, you can reorder the physical rows of the table. This may make the indexes a different size by inducing BTREE splits on another index.

Try reordering the table. Then, mysqldump it and reload it. You may find that the .MYI may get smaller or bigger. You cannot really predict it.

Next time, try reloading the MyISAM table with a very large bulk insert buffer.

Just run

SET GLOBAL bulk_insert_buffer_size = 1024 * 1024 * 1024;

then connect to mysql and reload the mysqldump.

GIVE IT A TRY !!!