Mysql – Between optimizing a 1-billion rows table and dump-restoring it, which would be faster

MySQLmysqldumpoptimizationperconarestore

Have multiple tables which had more than 1 billion rows. Archived and deleted many of them, now want to recover the disk space.

Have option to either optimize the table, or dump the table and restore it.

For restoration I will be splitting the dumped sql file into multiple files and will run parallel restore upto the point the CPU is not blocking much on DiskIO, to restore at the best possible rate.

Now, from mysql's point of view alone, which would be faster, and why?

I understand that DiskIO will be an important factor, but I want the technical points from mysql's perspective.

NOTE: The disk on which mysql is, its a high performance SSD disk.

Best Answer

OPTIMIZE is probably better because what it does is

  1. CREATE TABLE ...
  2. Copy all the existing rows over
  3. RENAME ...

(Some time during or after step 2, the indexes are rebuilt.)

Dump and reload:

  1. Read the entire table, write to disk
  2. DROP TABLE or TRUNCATE TABLE and CREATE TABLE ...
  3. Read the dump, inserting into the table

This is slower because of the extra I/O for the dump.

If disk space is tight, execute the dump (mysqldum/xtrabackup/etc) and restore (mysql) from a remote machine and have the dump file on that remote machine. This may be the only way to do the cleanup without running out of disk space.

Even better...

Next time you need to do a big DELETE...

  1. CREATE TABLE new LIKE real
  2. Copy the rows you do not want to delete into new
  3. RENAME TABLE real TO old, new TO real
  4. DROP TABLE old

This 'eliminates' the effort to do the DELETEs.

Deleting big chunks is a popular topic; see my blog on the topic.

Still even better...

If this is a time series, and you are dropping "old" data, PARTITION on the date. Then DROP PARTITION to jettison the old data 'instantly'. Details.

Caveat

If you have only 20GB of spare disk, and the remaining data + index in the InnoDB table is more than 20GB, OPTIMIZE will run out of disk space and fail (without doing 'anything'). So will dump+reload unless you can use a different machine for the dump. (The 'better' solutions need setup; can't do either this time.)

If the table is in ibdata1 (not innodb_file_per_table), no space will be given back to the OS. It will be available for future growth in InnoDB tables.

Addenda

Their may be a difference is size or speed between the restore and the OPTIMIZE. The rebuilding of the indexes may be different.

If you are doing any of this to recoup space on disk, start with the smallest table first and work your way up. Keep in mind that InnoDB makes some attempt at cleaning up holes in its BTrees. In theory, a well-kept BTree will average 31% empty space, even after lots of churn. So, that's the most you can expect to recoup from OPTIMIZE (or equivalent). However, a huge delete leaves free space in addition to the 31%.