MySQL InnoDB Files – Structure and Page Composition

innodbMySQL

Definition of InnoDB page is here : https://dev.mysql.com/doc/internals/en/innodb-page-structure.html

InnoDB stores all records inside a fixed-size unit which is commonly called a "page" (though InnoDB sometimes calls it a "block" instead). Currently all pages are the same size, 16KB.
A page contains records, but it also contains headers and trailers. I'll start this description with a high-altitude view of a page's parts, then I'll describe each part of a page. Finally, I'll show an example. This discussion deals only with the most common format, for the leaf page of a data file.

And based on their documentation, database tables and their tuples are stored in pages, for example a part of each page contains the row data which is the tuples

what i don't understand is :

  1. which files inside the mysql folder are made of these pages? only the .ibd files or..? and do these files contain any meta data at the beginning of them to show info about these pages and their addresses?

  2. when i want to read a particular row, lets say with the PRIMARY id of 1, what are the files that DBMS reads and which parts of them? I'm asking about the DBMS interaction with files, i want to know for example does it first read the .frm file to know the metadata of that table then reads the .ibd file of that table? and then goes down in the B+tree aka that .ibd file and finds the correct page in the clustered Index? or am i wrong?

Best Answer

With innodb_file_per_table = ON and 5.7 or older:

  • Each DATABASE is manifested on disk as a directory (with the database name).
  • In that directory are a few files for each TABLE in that Database.
  • The .frm file contains (effectively) the schema for the table, and essentially nothing else.
  • The .ibd file contains one or more B+Tree. One B+Tree contains the data, organized by the PRIMARY KEY. Each secondary key lives in a similarly structured B+Tree, but organized by the key, and containing a copy of the PK in the leaf nodes.

Meanwhile, there is some meta information about the table in the common ibdata1 file. (This implies that moving the .ibd file by itself will screw up the integrity of the database.)

File_per_table=OFF and "tablespaces" are variants on the above. In the old days, before .ibd files, all tables were in ibdata1.

MySQL 8 moves a lot of the meta info, plus the schema, into the "Data Dictionary". This is implemented using InnoDB. (I cringe at the bootstrap issues.)

To find the row uniquely identified by PRIMARY KEY = 1234, here is what happens:

  1. If not already opened, the table is "opened" and put in a cache of open tables. Some information from the .frm (or DD) is copied there. The .ibd file is located (or the table is found in ibdata1), etc.
  2. Locate the PK in the table. Get its 'root' node.
  3. Drill down that B+Tree until you find the 'row' in the leaf node (aka page, aka block) that has id=1234. On average, there will be 100 rows in that block, some before 1234, some after. So, in a million-row table, the "drill-down" might hit 3 blocks. Since blocks are cached in the buffer_pool, this involves between 0 and 3 disk hits (usually toward the 0 end).
  4. Deliver all the columns (or whatever is requested).

Note: If the lookup is by a secondary key (that is, not 'clustered'), there are more steps.

Also contained in each data block are a few things to facilitate traversing the rows and the blocks. (Think the + in B+Tree; read Wikipedia if necessary.)

Within each row in the block there are other things:

  • Fields, and their lengths, possibly pointers to off-record large chunks for TEXT and BLOB. (This gets around the 8K pseudo-limit on a row.)
  • Transaction ids for MVCC
  • A "history" of the revisions (again think MVCC) of each row that has been modified, but has not yet been commited/resolved (think ROLLBACK and transaction_isolation). The Undo log is also involved in these things.

Each block anywhere in the running MySQL system is uniquely identified identified by a rather large number. Part of it the 'tablespace' (ibdata1, foo.ibd, etc.) Part of it is the block number.

("Tuples" is a rather general term; I don't know what you are referring to here.)