This makes all the sense in the world to me.
InnoDB creates data pages and index pages that are 16K each. If rows of data are being inserted, updated, deleted, committed, and rolled back, you are going to have FRAGMENTATION !!!
There are two cases where you can have internal fragmentation:
- A single row could be written in multiple data pages because certain column values make a row too big to fit in the data page.
- Having a TEXT column with 32K of data in it.
In those two cases, a single row spanning multiple data pages would have to be chained like a linked list. The internally generated list of data pages would always have to be navigated when the row is read.
Giving credit where credit is due, PostgreSQL implemented a very brilliant mechanism called TOAST (The Oversized-Attribute Storage Technique) to keep oversized data outside of tables to stem the tide of this kind of internal fragmentation.
Have used mysqldump to make a file with CREATE TABLE statements, followed by lots of INSERTs, you get a fresh table with no unused space along with contiguous data and index pages when loading the mysqldump into a new server.
For my explanantions, let's assume you have an InnoDB table in the CUSTODIA database called userinfo
If you would like to compress a table, you have three(3) options
Option 1
OPTIMIZE TABLE CUSTODIA.userinfo;
Option 2
ALTER TABLE CUSTODIA.userinfo ENGINE=InnoDB;
Option 3
CREATE TABLE CUSTODIA.userinfo2 LIKE CUSTODIA.userinfo;
INSERT INTO CUSTODIA.userinfo2 SELECT * FROM CUSTODIA.userinfo;
DROP TABLE CUSTODIA.userinfo;
ALTER TABLE CUSTODIA.userinfo2 RENAME CUSTODIA.userinfo;
CAVEAT : Option 3 is no good on a table with constraints. Option 3 is perfect for MyISAM.
Now for your questions:
Question 1. Why is there this difference between original and restored database size?
As explained above
Question 2. Is it safe to assume that restored database is OK, although this difference in size?
If you want to make absolutely sure that the data on both servers are identical, simply run this command on both DB servers:
CHECKSUM TABLE CUSTODIA.userinfo;
Hopefully, the checksum value is identical for the same table on both servers. If you have dozens, even hundreds, of tables, you may have to script it.
Question 3 : How does MySQL calculate data_length? Is it an estimate?
You are using the correct method in summing up the data_length and index_length. Based on my explanation of fragmentation, it is an estimate.
Question 4. Can I safely reduce production's ibdata file size to 3.6GiB with no down-time?
GOOD NEWS !!! You absolutely can compress it !!! In fact, how would like to compress it to a fraction of that number ??? Follow these two links because I addressed this issue in StackOverflow and ServerFault.
https://stackoverflow.com/questions/3927690/howto-clean-a-mysql-innodb-storage-engine/4056261#4056261
https://serverfault.com/questions/230551/mysql-innodb-innodb-file-per-table-cons/231400#231400
BAD NEWS !!! Sorry, but you will have a 3-5 minute window of downtime for rebuilding ib_logfile0 and ib_logfile1 as well shrinking ibdata1 once and for all. It's well worth it since it will be a one-time operation.
You have set aside, 12000M for your InnoDB DataFiles (ibdata1...ibdata12). The only possible way for you to have this error is if all 12000M of InnoDB DataFiles have no more room to accommodate new rows into it. How is that possible?
There are four types of information that reside in InnoDB DataFiles
- Table Data Pages
- Table Index Pages
- Table MetaData
- MVCC Data
MVCC is Multiversion Concurrency Control. This facilitates ACID Compliance and Transaction Isolation for every SQL transaction, whether it is a single SQL statement or a block of SQL Statements. Whenever you run SQL against InnoDB Tables, that will definitely involve transaction control thus introducing new MVCC Data. Even if you do not execute START TRANSACTION...COMMIT/ROLLBACK paradigms in your application, AUTOCOMMIT is on by default. That will cause InnoDB to write MVCC Data around any data you are reading and/or writing. If there is enough MVCC in the InnoDB DataFiles, it could potentially block InnoDB row data of a certain length from being written.
You have three options to make this go away:
OPTION 1 : Add one or more InnoDB DataFiles to innodb_data_file_path
innodb_data_file_path=ibdata1:1000M;ibdata2:1000M;ibdata3:1000M;ibdata4:1000M;ibdata5:1000M;ibdata6:1000M;ibdata7:1000M;ibdata8:1000M;ibdata9:1000M;ibdata10:1000M;ibdata11:1000M;ibdata12:1000M;ibdata13:1000M;ibdata14:1000M
OPTION 2 : Add autoextend to the laste InnoDB DataFile in innodb_data_file_path
innodb_data_file_path=ibdata1:1000M;ibdata2:1000M;ibdata3:1000M;ibdata4:1000M;ibdata5:1000M;ibdata6:1000M;ibdata7:1000M;ibdata8:1000M;ibdata9:1000M;ibdata10:1000M;ibdata11:1000M;ibdata12:1000M:autoextend
OPTION 3 : Cleanup the InnoDB Infrastructure
This would be the most enduring solution because there is an option to keep Table Data Pages and Table Index Pages from ever entering the InnoDB DataFiles. You would have to set this option in my.cnf
[mysqld]
innodb-file-per-table
This creates a separate tablespace (.ibd) file for each InnoDB table create after you restart mysql with this new option. Just putting in the option and restart mysql will not create the tablespace file. The added bonus for doing this is that you can collapse the innodb_data_file_path to the default:
[mysqld]
innodb_data_file_path=ibdata1:10M:autoextend
I wrote up instructions on StackOverflow on how to do this.
Best Answer
The information_schema database is an all memory database made up of temporary tables. It is not stored in any of the InnoDB plumbing.
I wrote about this 5 years ago : How is INFORMATION_SCHEMA implemented in MySQL?
If you performed a mysqldump of the infomration_schema, reimporting it will do nothing.
mysqld will correct and protect any outside manual attempts to change it.
For example, suppose you wanted to create a database called
mynewdb
and your datadir is/var/lib/mysql
. if you ran the following commandsThis will create a database in the eyes of mysqld.
mynewdb
shows up in INFORMATION_SCHEMA.SCHEMATAYou can verify this by running the following:
The database
mynewdb
will be presentAs for your actual question, don't worry. You cannot break the INFORMATION_SCHEMA. mysqld won't let you.