MySQL – Database Size Relative to Dump File

innodbMySQLrestore

I'm restoring a ~52GB dump to a MySQL database. The ibdata1 file has already exceeded the size of the dump file and the restore is still incomplete. Is there any way to estimate the final size of the ibdata1 file if the size of the MySQL dump file is known?

Best Answer

Just from the words in your question I suspect the following: You most likely have innodb_file_per_table disabled.

NOTE : The following info is based on innodb_file_per_table being disabled

When insert data into InnoDB tables, everything and its grandmother lands in the system tablespace file, better known as ibdata1. What does ibdata1 actually contain?

  • Table Data
  • Table Indexes
  • Metadata
  • MVCC Info

Table Data and Indexes while initially bloat ibdata1. The metadata is just simply the data dictionary + the list of tablespace_ids assign on a per-table basis.

What about MVCC (Multiversioning Concurrency Control)? This represents the systrem objects designed to support transaction isolation, rollbacks, undo logs, insert buffers for secondayr indexes, and the double- write buffer.

You need to clean up the InnoDB infrastructure. I already wrote StackExchange posts on how and why to do this:

Getting back to your original question, the only way to estimate the size of the ibdata1 on reload would have been to run this query before the mysqldump:

SELECT
    data_length/power(1024,3) InnoDBData,
    index_length/power(1024,3) InnoDBIndexes,
    (data_length+index_length)/power(1024,3) InnoDBSize
FROM
    information_schema.tables
WHERE
    engine='InnoDB';

This will report the size of the data in GB. Upon a fresh reload of the ibdata1 (with innodb_file_per_table disabled in your case), this would have been the rule of thumb for the size estimation.

From the dump file size, it is hard to judge because the combined total size of data pages and index pages maybe far less that the size of ibdata1 the dump was created from. That difference would be have leftover space from the bloating of MVCC system objects (rollback segments, undo logs, double-write buffer, secondary index insert buffer). From another perspective, data pages could outnumber index pages, and vice versa. This could be due to too many indexes, bad design, or just a judicious amount of data.