Mysql – Innodb full table Scan

indexinnodbMySQL

mytable is an InnoDB table with (a, b) as the primary key.

Does this query check all table?

Select * 
from    mytable    
where   ( b='2' and c='3')

I want to find out if 'cluster' structure helps this query not check all table. (when I don't have all part of primary key in my where clause)

Would it be faster if I used MyISAM and a secondary index? Because In full table scan, It seems that Innodb caches a page for every record and so its not fast.
I've read about designing composite primary keys in an article:
"…..For the sake of example let’s assume two rows were added for a user whose user_id is 50 and they have been assigned with id=10000000 and id=10001000, where id is the primary key in this table. What sort of work the following query has to perform in order to return results?

SELECT * FROM messages WHERE user_id = 50

It finds index entires where user_id is 50. With the primary key values from the secondary index, it can then locate the physical rows. So InnoDB first seeks a row where id is 10000000. But since it does not operate on individual rows at this point, but rather on the blocks of records called pages, it locates the proper page on disk and caches it in the buffer pool. Only then it can extract the actual row, return it, and move on to the next one where id is 10001000. Since the distance between the two records is significant, it turns out that it is not on the same page that was just pulled into the cache and which holds the first row. InnoDB has to go back to disk to fetch another page, cache it, extract the row and return it.

In the above example at least two I/Os were issued. In a pessimistic case for a few hundred matching rows, when a user has that many messages, that could result in a few hundred I/O operations. Random I/O. Assuming as little as a few milliseconds for each operation, the query may need as much as one second to execute. It would use index properly, it would only pull a few hundred rows, but would still perform poorly. With the exception for flash technology any typical storage is rather slow, and also very difficult to scale, so such scenario should be avoided. ……"

Best Answer

It will do a full table scan.

The columns in an index are ordered. Think about finding a person in a list of people where the list is sorted by last name, but all you know is the first name.

PRIMARY KEY(b,a)

would be somewhat good for WHERE b=1 AND c=2 -- it could at least narrow it down to all the rows where b=1. That is, it would do a "range scan" of only part of the table.

INDEX(b,c)

would (probably) work better (given that you leave the PK at (a,b)). This because it would drill down that secondary index's BTree to exactly the row(s) with (1,2), then reach over into Data+PK to find the rest of the columns needed.

No, don't even consider MyISAM these days. There are more drawbacks than this one advantage -- if it even is an advantage.

See Cookbook for Creating Indexes.

More

The "left" part (b,a in this example) of the index can be directly used.

PRIMARY KEY(b,a,c,d) would benefit where b=1 and a=2 and d=3, but only because of b and a. The BTree (actually a B+Tree) can efficiently be scanned. The code would drill down to where b=1 and a=2 starts, then scan forward until b=1 and a=2 no longer applies. c is in the way of making use of d=3, though it will be checked.

An InnoDB BTree are composed of 16KB block(s). If you have only a few rows, every thing will be in one block. If you have a billion rows, there may be many blocks for the b=1 entries, and the BTree may be about 5 physical levels deep of nodes pointing to sub-nodes. The B+Tree means that it can scan one leaf node, then step right to the next leaf node. A rule of thumb: a block has 100 entries in it. (In reality, it is anywhere from 1 to at least 1000, depending on a lot of things.)

One thing to remember in dealing with a MySQL PRIMARY KEY -- it is UNIQUE. So don't randomly add/subtract columns from the definition; you will change what it is testing for uniqueness.