Mysql – ‘SELECT the_table.id’ slower than ‘SELECT the_table.*’

innodbMySQLperformancequery-performance

Running MySQL with InnoDB:

I have a SELECT wide_table.* query that I want to refine to SELECT wide_table.id since that's all the calling code needs. Testing it out I've found that the execution time with * is faster than with id (though the time to transfer the result over the network is faster with the "refined" version).

Why would this be the case?

The query (with names changed to protect the innocent) is:

SELECT
    `things`.*
FROM
    `things`
WHERE
    `things`.`active` = 1
        AND (owner_id IS NOT NULL
        AND owner_id > 0)
        AND ((`things`.`status` IN (0 , 1)
        OR `things`.`status` IS NULL))
        AND (date < '2015-07-11 00:00:00');

There's a compound index on active and date, which is being used in both versions.

For reference, the ouput of SHOW CREATE TABLE (with irrelevant columns omitted):

CREATE TABLE `things` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `active` tinyint(1) DEFAULT '0',
  `date` datetime DEFAULT NULL,
  `owner_id` int(11) DEFAULT '0',
  `status` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_things_on_active_and_date` (`active`,`date`),
  KEY `index_things_on_date` (`date`),
  KEY `index_things_on_owner_id` (`owner_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1862 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Limits

After playing around with it for a while, I've found that the comparison is affected by the limit imposed on the query. With large limits, >200, the * version reports a faster execution time. As the limit is decreased <200, the id version gains the upper-hand. Still not sure what to make of this…

EXPLAIN

Running EXPLAIN on both versions of the query yields identical output, with select_type: SIMPLE and key: index_things_on_active_and_date.

Best Answer

The EXPLAIN plan for SELECT * and SELECT id, which you said were identical, would be separated by the number of rows accessed. How are the rows being accessed ? Through the index_things_on_active_and_date index. The SIMPLE in the EXPLAIN means it is a scan. In both cases, it is an index scan based on active=1 and date < '2015-07-11 00:00:00'

How does the index scan occur ?

Since the chosen index has active and date, the range scan would occur from two columns.
Look at the WHERE clause. You are retrieving owner_id and status and checking the values. This requires you accessing the whole row.
- SELECT * means you make the whole row part of the result set
- SELECT id means you make the id part of the result set

Where is this leading up to ?

In both cases, you are reading an entire row to check two additional columns not in the index.
The result set of each query is either an entire row or single column (id).
It must take some additional time to create a smaller result set. Why ? There comes a point when a longer result of id values would take longer the build that going with the whole row you read anyway. You yourself empirically tested that and discovered the following:
- < 200 rows -> SELECT id is faster
- > 200 rows -> SELECT * is faster
- = 200 rows -> SELECT id and SELECT * are about the same

There is something else you need to realize about InnoDB

InnoDB stores data for a row is stored along with the index pages making up the PRIMARY KEY. This is known as the clustered index (gen_clust_index)
Non-unique indexes will store the primary key fields of row along with the index entries.

This tells you that the index index_things_on_active_and_date actually has three columns: 1) active, 2) date, 3) id.

You are probably saying: Why is SELECT * running better that SELECT id? (which is the original question) It goes back to what I said: the WHERE clause is causing the Query Optimizer to check non-indexed columns status and owner_id. You are creating additional work check an index entry and something from the row that is indexed.

If you create this index

ALTER TABLE things
    ADD INDEX index_everything_and_kitchen_sink
    (active,date,owner_id,status)
;

and run the two queries, then you will see the advantage go to SELECT id no matter how many rows you are accessing. Why ? Because all the columns in the WHERE clause are checked from the index only. This type of index is called a covering index.

Here are some good links about covering indexes

I mentioned these links in some of my answers:

Feb 10, 2012 Unexpected extremely long query time (~5 minutes using nested WHEN-INs)
Oct 17, 2012 : Combining columns in index
Jan 11, 2013 : MySQL: To use MYISAM or INNODB engine? (plot twist enclosed)

YOUR QUERY

SELECT post.postid, post.attach FROM newbb_innopost AS post WHERE post.threadid = 51506;

At first glance, that query should only touches 1.1597% (62510 out of 5390146) of the table. It should be fast given the key distribution of threadid 51506.

REALITY CHECK

No matter which version of MySQL (Oracle, Percona, MariaDB) you use, none of them can fight to one enemy they all have in common : The InnoDB Architecture.

InnoDB Architecture

CLUSTERED INDEX

Please keep in mind that the each threadid entry has a primary key attached. This means that when you read from the index, it must do a primary key lookup within the ClusteredIndex (internally named gen_clust_index). In the ClusteredIndex, each InnoDB page contains both data and PRIMARY KEY index info. See my post Best of MyISAM and InnoDB for more info.

REDUNDANT INDEXES

You have a lot of clutter in the table because some indexes have the same leading columns. MySQL and InnoDB has to navigate through the index clutter to get to needed BTREE nodes. You should reduced that clutter by running the following:

ALTER TABLE newbb_innopost
    DROP INDEX threadid,
    DROP INDEX threadid_2,
    DROP INDEX threadid_visible_dateline,
    ADD INDEX threadid_visible_dateline_index (`threadid`,`visible`,`dateline`,`userid`)
;

Why strip down these indexes ?

The first three indexes start with threadid
threadid_2 and threadid_visible_dateline start with the same three columns
threadid_visible_dateline does not need postid since it's the PRIMARY KEY and it's embedded

BUFFER CACHING

The InnoDB Buffer Pool caches data and index pages. MyISAM only caches index pages.

Just in this area alone, MyISAM does not waste time caching data. That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.

TABLE LAYOUT

You could shave of some space from the row by considering importthreadid and importpostid. You have them as BIGINTs. They take up 16 bytes in the ClusteredIndex per row.

You should run this

SELECT importthreadid,importpostid FROM newbb_innopost PROCEDURE ANALYSE();

This will recommend what data types these columns should be for the given dataset.

CONCLUSION

MyISAM has a lot less to contend with than InnoDB, especially in the area of caching.

While you revealed the amount of RAM (32GB) and the version of MySQL (Server version: 10.0.12-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4002), there are still other pieces to this puzzle you have not revealed

The InnoDB settings
The Number of Cores
Other settings from my.cnf

If you can add these things to the question, I can further elaborate.

UPDATE 2014-08-28 11:27 EDT

You should increase threading

innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M

I would consider disabling the query cache (See my recent post Why query_cache_type is disabled by default start from MySQL 5.6?)

query_cache_size = 0

I would preserve the Buffer Pool

innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1

Increase purge threads (if you do DML on multiple tables)

innodb_purge_threads = 4

Limits

EXPLAIN

Best Answer

Related Solutions

Mysql – Slow INSERT/UPDATE on InnoDB

Mysql – Why are simple SELECTs on InnoDB 100x slower than on MyISAM