The table is simple:
CREATE TABLE `t1` (
`ID` int NOT NULL AUTO_INCREMENT,
`name` int DEFAULT '1',
`LastName` int DEFAULT '0',
UNIQUE KEY `idx_ID` (`ID`) USING BTREE,
KEY `idx_name` (`name`)
) ENGINE=InnoDB;
DELIMITER $$
DROP PROCEDURE IF EXISTS populate_t1;
CREATE PROCEDURE populate_t1(IN num INT)
BEGIN
DECLARE COUNT INT DEFAULT 0;
WHILE COUNT < num DO
INSERT INTO t1(`name`,`LastName`) VALUES(round(rand()+5, 0), round(rand()*10000, 0));
SET COUNT = COUNT + 1;
END WHILE;
END $$
DELIMITER ;
CALL populate_t1(1000000);
CREATE INDEX idx_name ON t1 (`name`);
OPTIMIZE TABLE t1;
The column name
has very low selectivity (i.e., only 2 distinct values). Query
> SELECT * FROM t1 WHERE Name=5;
> OK, 499845 records, Time: 0.840s
would be slower than the query which use a full table scan
> DROP INDEX idx_name ON t1;
> OK
> SELECT * FROM t1 WHERE Name=5;
> OK, 499845 records, Time: 0.258s
This performance drop is expected, but I want to figure out why.
I know that the index is like an English dictionary index in the first several pages of an English dictionary. And the index is stored in a B+tree in my example I suppose. So with an index on column name
, MySQL would know which rows satisfy name=5
. Then MySQL fetches these rows into the result set. I think the process above would be quicker than scanning the whole table (but I am wrong). So what I want to ask is which process makes (e.g., loading index from disk to memory, searching in the B+tree, etc.) the index scan slower than the full table scan.
Further, I know that if the table is larger, the performance difference would be larger (I still want to know why).
I would be very grateful for your help!
Thanks!
Best Answer
Access to a row using a non-clustered index, which
idx_name
is, requires extra random I/O: b-tree lookup finds the clustered (primary key) index value, then you need to go and fetch the actual row from the clustered index.The alternative is a sequential scan of the clustered index itself, which does not incur that extra I/O cost and is also simply more efficient by being sequential, not random.