Mysql – ‘SELECT the_table.id’ slower than ‘SELECT the_table.*’

innodbMySQLperformancequery-performance

Running MySQL with InnoDB:

I have a SELECT wide_table.* query that I want to refine to SELECT wide_table.id since that's all the calling code needs. Testing it out I've found that the execution time with * is faster than with id (though the time to transfer the result over the network is faster with the "refined" version).

Why would this be the case?

The query (with names changed to protect the innocent) is:

SELECT
    `things`.*
FROM
    `things`
WHERE
    `things`.`active` = 1
        AND (owner_id IS NOT NULL
        AND owner_id > 0)
        AND ((`things`.`status` IN (0 , 1)
        OR `things`.`status` IS NULL))
        AND (date < '2015-07-11 00:00:00');

There's a compound index on active and date, which is being used in both versions.

For reference, the ouput of SHOW CREATE TABLE (with irrelevant columns omitted):

CREATE TABLE `things` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `active` tinyint(1) DEFAULT '0',
  `date` datetime DEFAULT NULL,
  `owner_id` int(11) DEFAULT '0',
  `status` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_things_on_active_and_date` (`active`,`date`),
  KEY `index_things_on_date` (`date`),
  KEY `index_things_on_owner_id` (`owner_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1862 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Limits

After playing around with it for a while, I've found that the comparison is affected by the limit imposed on the query. With large limits, >200, the * version reports a faster execution time. As the limit is decreased <200, the id version gains the upper-hand. Still not sure what to make of this…

EXPLAIN

Running EXPLAIN on both versions of the query yields identical output, with select_type: SIMPLE and key: index_things_on_active_and_date.

Best Answer

The EXPLAIN plan for SELECT * and SELECT id, which you said were identical, would be separated by the number of rows accessed. How are the rows being accessed ? Through the index_things_on_active_and_date index. The SIMPLE in the EXPLAIN means it is a scan. In both cases, it is an index scan based on active=1 and date < '2015-07-11 00:00:00'

How does the index scan occur ?

  • Since the chosen index has active and date, the range scan would occur from two columns.
  • Look at the WHERE clause. You are retrieving owner_id and status and checking the values. This requires you accessing the whole row.
    • SELECT * means you make the whole row part of the result set
    • SELECT id means you make the id part of the result set

Where is this leading up to ?

  • In both cases, you are reading an entire row to check two additional columns not in the index.
  • The result set of each query is either an entire row or single column (id).
  • It must take some additional time to create a smaller result set. Why ? There comes a point when a longer result of id values would take longer the build that going with the whole row you read anyway. You yourself empirically tested that and discovered the following:
    • < 200 rows -> SELECT id is faster
    • > 200 rows -> SELECT * is faster
    • = 200 rows -> SELECT id and SELECT * are about the same

There is something else you need to realize about InnoDB

This tells you that the index index_things_on_active_and_date actually has three columns: 1) active, 2) date, 3) id.

You are probably saying: Why is SELECT * running better that SELECT id? (which is the original question) It goes back to what I said: the WHERE clause is causing the Query Optimizer to check non-indexed columns status and owner_id. You are creating additional work check an index entry and something from the row that is indexed.

If you create this index

ALTER TABLE things
    ADD INDEX index_everything_and_kitchen_sink
    (active,date,owner_id,status)
;

and run the two queries, then you will see the advantage go to SELECT id no matter how many rows you are accessing. Why ? Because all the columns in the WHERE clause are checked from the index only. This type of index is called a covering index.

Here are some good links about covering indexes

I mentioned these links in some of my answers: