MySQL Dump Import – Causes of High I/O

dumpimportMySQLmysqldump

I imported 100 SQL-format dump files. The total size was 20 GB. The resulting .ibd files after the import had a total size of 40 GB. mysqld.exe wrote 1.1 TB and read 120 GB on the disk. Why so many I/O?

I used the default options of MySQL Workbench to create the SQL-format dump files, namely:

enter image description here

It seems to match pretty well the advice of the section Bulk Data Loading for InnoDB Tables of the MySQL Reference Manual (except for autocommit=0, which is not present in my dumps). The resulting code headers in the dump look like:

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `askbot_activityauditstatus`
--
LOCK TABLES `askbot_activityauditstatus` WRITE;
/*!40000 ALTER TABLE `askbot_activityauditstatus` DISABLE KEYS */;
INSERT INTO `askbot_activityauditstatus` VALUES 
etc

I know I could try using other dump tools to speed up, but I'm curious. I'm especially impressed by the number of writes.

I use Windows 7 SP1 x64, MySQL 5.6 and MySQL Workbench 6.0.

Best Answer

Restoring a dump file, fundamentally, involves inserting all of the rows and building all of the indexes for all of the data. In that light, it's not particularly relevant how the dump file is generated, whether you use one or several files, or which of the available options you select (though some of them, like extended-insert could possibly make it worse of not selected).

When you're restoring a file, you're also:

  • writing the data to the transaction log
  • writing the data to the double-write buffer
  • writing the data to the tablespace files
  • writing the indexes to the double-write buffer
  • writing the indexes to the tablepace files
  • writing the data to the binary log if that's enabled ... and if you have the default binlog_format setting of "statement," then you're literally writing almost every byte in the dump file plus overhead... but if your binlog_format is set to "row," you're writing a more compact version to the binlogs... and if your binlog_format is set to "mixed," then which format is actually used in the binlog depends on your default transaction isolation level.

Setting innodb_flush_log_at_trx_commit to the slightly less safe value of 2 or significantly less safe value of 0 from the very expensive but ACID-compliant default value of 1 will speed up your insert for sure, though it isn't likely to reduce actual I/O because that value doesn't change what's written to the transaction log, it merely changes how often InnoDB insists on confirmation from the operating system that the contents of the log have been persisted to disk. I use "safe" in the sense of safety against the loss of recent transactions if a crash occurs during the time the value is set to 2 or 0; 1 protects against data loss if either MySQL or the system crashes; 2 protects against data loss if MySQL crashes but not the system, and 0 protects against neither. Once you set it back, it doesn't have any after-effects.

Quick side note, some of the comment-looking things shown are not comments. The /*!mnnrr format is a MySQL backwards-compatibility extension that tells the server "If you're MySQL version m.nn.rr or higher, execute this statement, otherwise disregard."

/*!40000 ALTER TABLE `table_name` DISABLE KEYS */;
/*!40000 ALTER TABLE `table_name` ENABLE KEYS */;

Each table is surrounded by these in a dump file. These were more useful with MyISAM than with InnoDB, because DISABLE KEYS directed the storage engine not to update any non-unique indexes until ENABLE KEYS was issued, allowing all of the row data to be written, and then indexed in a batch. With InnoDB, the indexes are built as the inserts are processed... so there's a lot of I/O potential there as the index trees are being built and shuffled around.

The size of the InnoDB buffer pool is going to play a role in the amount of disk I/O -- possibly a significant one if it's relatively small -- because whatever can't stick around in memory will have to be promptly flushed to disk, only to be read back in when it's needed again, and this will be particularly true with indexes, less so with the actual rows, because in InnoDB, rows are physically stored in primary key order, and mysqldump writes them to the file in primary key order... so they're inserted in the order in which they'll be stored... but the secondary indexes will have to be shuttled back and forth to disk as index pages are updated as the insert operations on a given table progresses.

So there's a baseline of "extra" activity that every insert that happens, because of all the different logging, safety, and ACID mechanisms... and indexes seem like a likely wildcard candidate for potentially creating a substantial amount of additional I/O.