Running MySQL with InnoDB:
I have a SELECT wide_table.*
query that I want to refine to SELECT wide_table.id
since that's all the calling code needs. Testing it out I've found that the execution time with *
is faster than with id
(though the time to transfer the result over the network is faster with the "refined" version).
Why would this be the case?
The query (with names changed to protect the innocent) is:
SELECT
`things`.*
FROM
`things`
WHERE
`things`.`active` = 1
AND (owner_id IS NOT NULL
AND owner_id > 0)
AND ((`things`.`status` IN (0 , 1)
OR `things`.`status` IS NULL))
AND (date < '2015-07-11 00:00:00');
There's a compound index on active
and date
, which is being used in both versions.
For reference, the ouput of SHOW CREATE TABLE
(with irrelevant columns omitted):
CREATE TABLE `things` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`active` tinyint(1) DEFAULT '0',
`date` datetime DEFAULT NULL,
`owner_id` int(11) DEFAULT '0',
`status` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_things_on_active_and_date` (`active`,`date`),
KEY `index_things_on_date` (`date`),
KEY `index_things_on_owner_id` (`owner_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1862 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Limits
After playing around with it for a while, I've found that the comparison is affected by the limit imposed on the query. With large limits, >200, the *
version reports a faster execution time. As the limit is decreased <200, the id
version gains the upper-hand. Still not sure what to make of this…
EXPLAIN
Running EXPLAIN
on both versions of the query yields identical output, with select_type: SIMPLE
and key: index_things_on_active_and_date
.
Best Answer
The EXPLAIN plan for
SELECT *
andSELECT id
, which you said were identical, would be separated by the number of rows accessed. How are the rows being accessed ? Through theindex_things_on_active_and_date
index. TheSIMPLE
in the EXPLAIN means it is a scan. In both cases, it is an index scan based onactive=1
anddate < '2015-07-11 00:00:00'
How does the index scan occur ?
active
anddate
, the range scan would occur from two columns.WHERE
clause. You are retrievingowner_id
andstatus
and checking the values. This requires you accessing the whole row.SELECT *
means you make the whole row part of the result setSELECT id
means you make the id part of the result setWhere is this leading up to ?
id
values would take longer the build that going with the whole row you read anyway. You yourself empirically tested that and discovered the following:< 200 rows
->SELECT id
is faster> 200 rows
->SELECT *
is faster= 200 rows
->SELECT id
andSELECT *
are about the sameThere is something else you need to realize about InnoDB
This tells you that the index
index_things_on_active_and_date
actually has three columns: 1)active
, 2)date
, 3)id
.You are probably saying: Why is
SELECT *
running better thatSELECT id
? (which is the original question) It goes back to what I said: the WHERE clause is causing the Query Optimizer to check non-indexed columnsstatus
andowner_id
. You are creating additional work check an index entry and something from the row that is indexed.If you create this index
and run the two queries, then you will see the advantage go to
SELECT id
no matter how many rows you are accessing. Why ? Because all the columns in the WHERE clause are checked from the index only. This type of index is called a covering index.Here are some good links about covering indexes
I mentioned these links in some of my answers:
Feb 10, 2012
Unexpected extremely long query time (~5 minutes using nested WHEN-INs)Oct 17, 2012
: Combining columns in indexJan 11, 2013
: MySQL: To use MYISAM or INNODB engine? (plot twist enclosed)