Mysql – Innodb full table Scan

indexinnodbMySQL

mytable is an InnoDB table with (a, b) as the primary key.

Does this query check all table?

Select * 
from    mytable    
where   ( b='2' and c='3')

I want to find out if 'cluster' structure helps this query not check all table. (when I don't have all part of primary key in my where clause)

Would it be faster if I used MyISAM and a secondary index? Because In full table scan, It seems that Innodb caches a page for every record and so its not fast.
I've read about designing composite primary keys in an article:
"…..For the sake of example let’s assume two rows were added for a user whose user_id is 50 and they have been assigned with id=10000000 and id=10001000, where id is the primary key in this table. What sort of work the following query has to perform in order to return results?

SELECT * FROM messages WHERE user_id = 50

It finds index entires where user_id is 50. With the primary key values from the secondary index, it can then locate the physical rows. So InnoDB first seeks a row where id is 10000000. But since it does not operate on individual rows at this point, but rather on the blocks of records called pages, it locates the proper page on disk and caches it in the buffer pool. Only then it can extract the actual row, return it, and move on to the next one where id is 10001000. Since the distance between the two records is significant, it turns out that it is not on the same page that was just pulled into the cache and which holds the first row. InnoDB has to go back to disk to fetch another page, cache it, extract the row and return it.

In the above example at least two I/Os were issued. In a pessimistic case for a few hundred matching rows, when a user has that many messages, that could result in a few hundred I/O operations. Random I/O. Assuming as little as a few milliseconds for each operation, the query may need as much as one second to execute. It would use index properly, it would only pull a few hundred rows, but would still perform poorly. With the exception for flash technology any typical storage is rather slow, and also very difficult to scale, so such scenario should be avoided. ……"

Best Answer

It will do a full table scan.

The columns in an index are ordered. Think about finding a person in a list of people where the list is sorted by last name, but all you know is the first name.

PRIMARY KEY(b,a)

would be somewhat good for WHERE b=1 AND c=2 -- it could at least narrow it down to all the rows where b=1. That is, it would do a "range scan" of only part of the table.

INDEX(b,c)

would (probably) work better (given that you leave the PK at (a,b)). This because it would drill down that secondary index's BTree to exactly the row(s) with (1,2), then reach over into Data+PK to find the rest of the columns needed.

No, don't even consider MyISAM these days. There are more drawbacks than this one advantage -- if it even is an advantage.

See Cookbook for Creating Indexes.

More

The "left" part (b,a in this example) of the index can be directly used.

PRIMARY KEY(b,a,c,d) would benefit where b=1 and a=2 and d=3, but only because of b and a. The BTree (actually a B+Tree) can efficiently be scanned. The code would drill down to where b=1 and a=2 starts, then scan forward until b=1 and a=2 no longer applies. c is in the way of making use of d=3, though it will be checked.

An InnoDB BTree are composed of 16KB block(s). If you have only a few rows, every thing will be in one block. If you have a billion rows, there may be many blocks for the b=1 entries, and the BTree may be about 5 physical levels deep of nodes pointing to sub-nodes. The B+Tree means that it can scan one leaf node, then step right to the next leaf node. A rule of thumb: a block has 100 entries in it. (In reality, it is anywhere from 1 to at least 1000, depending on a lot of things.)

One thing to remember in dealing with a MySQL PRIMARY KEY -- it is UNIQUE. So don't randomly add/subtract columns from the definition; you will change what it is testing for uniqueness.

Related Solutions

Mysql – have create an InnoDB table without a PK? If so, would a secondary index be used as the clustered index

If you have a unique key that is not null, it gets promoted internally to be the clustered key (or if you have more than one that matches this criteria - the first one does.)

If you don't have a unique key that is not null, a hidden key is created that is 6 bytes in length.

There is no way to specify the key to cluster around, and there is no way to not have a clustered index.

Mysql – What would be the fastest approach to bulk update this thesql table (Innodb)

The first thing you need to determine about col2 is if it can be a PRIMARY KEY.

Run this query

SELECT COUNT(1),col2 FROM table GROUP BY col2 HAVING COUNT(1) > 1;

If nothing comes back, then col2 can be a UNIQUE KEY. If even one row comes back, then col2 cannot be a UNIQUE KEY. You can create an index on it.

Since this query would take a while without an index, just go ahead and index it

ALTER TABLE table ADD INDEX col2 (col2);

but that might take a very long time.

You may need a temp table to accomplish this:

Step 01) Create a table that will hold the one million col2 values

DROP TABLE IF EXISTS col2values;
CREATE TABLE col2values ENGINE=InnoDB SELECT col2 FROM table WHERE 1=2;
ALTER TABLE col2value ADD PRIMARY KEY (col2);

Step 02) You can load the col2values table with the one million col2 values

Step 03) Index col2

CREATE TABLE table2 LIKE table;
ALTER TABLE table2 ADD INDEX col2 (col2);
INSERT INTO table2 SELECT * FROM table;
DROP TABLE table;
ALTER TABLE table2 RENAME table;

Step 04) Perform update using INNER JOIN

UPDATE col2values A
INNER JOIN table B
USING (col2)
SET B.col1=1;

In the above steps

Step 03 needs to be done only once
Step 03 is done with 5 lines rather than ALTER TABLE table ADD INDEX col2 (col2); because it would require a full rollback operation should anything go wrong.
All other steps are your bulk insert steps going forward

Give it a Try !!!

Best Answer

Related Solutions

Mysql – have create an InnoDB table without a PK? If so, would a secondary index be used as the clustered index

Mysql – What would be the fastest approach to bulk update this thesql table (Innodb)

Related Question