Mysql – nnoDB use a clustered index as a covering index

clustered-indexinnodbMySQLtree

The scenario I have in mind is a query like:

SELECT id FROM table WHERE id = ?

or equivalently,

   SELECT 1 FROM table WHERE id = ?

So here's some background:

In a B+ tree (which InnoDB uses as its clustered index (PK) data structure),
internal nodes contain records which are essentially (key, *pointer-to-page) tuples (obviously simplifying a bit).
Formally, the key values in the internal nodes don't need to be extant, i.e. there doesn't have to be a corresponding row in a leaf node with that key.
But, in practice, they most often are.

Now not all key values exist in internal nodes, and in that case, obviously a leaf node will have to be loaded and searched for the asked-for key. But if there was a matching key found in an internal node, and it is known to be extant in an actual leaf record, then InnoDB could take advantage of those facts and save loading the leaf page (and therefore potentially save an expensive disk read).

So my questions are:

  • Do InnoDB internal nodes only contain extant keys? (would need to deal with deletes and even updates of a primary key)

  • If so, does InnoDB take advantage of this in resolving queries of the above kind; or does it always load a leaf page?

  • Is there anyway using the existing MySQL profiling stats (SHOW STATUS . . ., SHOW VARIABLE, INFORMATION_SCHEMA, etc) to determine whether a particular id (e.g. SELECT 1 FROM table WHERE id = 345) actually elided a leaf-page load?

Best Answer

How indexes work in InnoDB. This discussion is limited to InnoDB.

(Let me establish some ground work, then I will get to your question.)

The PRIMARY KEY is a BTree where the leaf nodes contain all the columns of the row. That is, the PK and the data coexist in the same BTree.

Each secondary key is a BTree where the leaf nodes contain a copy of the PRIMARY KEY. Hence, a second lookup is needed to actually get to the rest of the columns.

DELETEing a row requires removing a 'row' from each BTree.

UPDATEing a column that is not in any index requires (1) locating the record (in the data BTree), and (2) modifying the record. (Plus lots of Multi Version Concurrency Control stuff.)

UPDATEing a column that is in some index(s) requires (logically) deleting the record from each index where the column was part of that index and inserting a row in another place in that BTree.

SELECTing a row via the PK (WHERE Id = ...) usually requires drilling down the BTree to the leaf node. (You have presented a rare exception.)

SELECTing a row via a secondary key (WHERE other = ...) requires two steps: (1) drill down the secondary BTree to the leaf node, fetching the PK columns, and then maybe (2) drill down the PK BTree to get to the record.

The "maybe" depends on whether or not all the needed columns are found in the secondary index (keep in mind that that includes the columns of the PK). If everything is found in the secondary BTree, it is called a "covering index" for that select.

An optimization is to add extra non-TEXT, non-BLOB columns to a non-UNIQUE index to make it "covering". (There are diminishing returns if the columns you want to add are big, or there are "too many". I suggest no more than 5 columns in an index, but there are exceptions.)

The next-to-bottom node in a BTree contains only the boundaries of the leaf node. That is, for SELECT 1 FROM table WHERE id = 345, the "345" probably exists only in the leaf node.

The only case I know of for not going all the way to the leaf node is when ANALYZE TABLE is gathering statistics.

The + in B+Tree refers to the leaf pages being linked together, thereby making "range" scans (in table or in index) slightly more efficient. A hundred (or so) rows are found in one leaf node block, then the link gets it to the next block for the next hundred. Otherwise, it would have to come "back down" the tree to find the 'next' block. An InnoDB 'block' is 16KB.

Back to your key question. Why not optimize the case where 345 can be seen in a non-leaf node? Because it is such a rare query that it does not warrant extra code, extra code to break, and might not average out as being better. Some issues:

  • Only 1% (or so) of keys would be sped up.
  • The block is likely to be in cache already.
  • The block is likely to be needed -- Note that your example requires SELECT 1, which is a rather rare subset of selects. It would take extra effort to discover this special case.
  • If the row needs to be locked (for MVCC, etc), then it needs to get to the leaf node. This probably requires that the leaf node be fetched.

A Rule of Thumb of mine: If some optimization is not likely to save 10% (by some metric), I should look elsewhere to improvements.