Mysql – Simple SELECT takes forever, but works as a subquery. Why

execution-planMySQLoptimization

I'm running a MySQL 5.1 server and I have a MyISAM table called my_table with nearly 5 million records. Its structure is the following:

+-----------------------------------------------------------------------+
|Field           |Type      |Null|Key|Default            |Extra         |
+-----------------------------------------------------------------------+
|id              |int(11)   |NO  |PRI|                   |auto_increment|
+-----------------------------------------------------------------------+
|article_id      |int(11)   |NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|user_id         |int(11)   |NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|timestamp       |datetime  |YES |MUL|0000-00-00 00:00:00|              |
+-----------------------------------------------------------------------+
|txt             |text      |YES |   |                   |              |
+-----------------------------------------------------------------------+
|soukrome        |tinyint(4)|NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|vazna_kritika   |tinyint(4)|NO  |   |0                  |              |
+-----------------------------------------------------------------------+
|kos             |tinyint(4)|NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|done            |tinyint(4)|NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|autor_id        |int(11)   |NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|sbirka_id       |int(11)   |NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|klub_id         |int(11)   |NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|klub_id_procleny|int(11)   |NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|notinlist       |tinyint(1)|NO  |MUL|0                  |              |
+-----------------------------------------------------------------------+
|admin_id        |int(11)   |NO  |   |0                  |              |
+-----------------------------------------------------------------------+
|kritika_redakce |tinyint(1)|NO  |   |0                  |              |
+-----------------------------------------------------------------------+

Now, when I run this simple query that should return the time of the latest comment, the query hangs for several minutes:

SELECT timestamp FROM my_table WHERE article_id = 123456 ORDER BY timestamp DESC LIMIT 1

The strange thing about this is that for most article_ids the query finishes almost instantly. It only hangs for some article_ids.

EXPLAINing the query doesn't help. The execution plan seems pretty standard:

+----------------------------------------------------------------------------------------------------+
|id|select_type|table   |type|possible_keys|key       |key_len|ref  |rows|Extra                      |
+----------------------------------------------------------------------------------------------------+
|1 |SIMPLE     |my_table|ref |article_id   |article_id|4      |const|1   |Using where; Using filesort|
+----------------------------------------------------------------------------------------------------+

Now, the weirdest thing is, that if I rewrite the query like this, everything starts to behave correctly:

SELECT * FROM ( SELECT timestamp FROM my_table WHERE article_id = 123456 ) _my_table ORDER BY timestamp DESC LIMIT 1

I'm just really curious if anyone has any idea why such an issue might be happening?

EDIT: Adding SHOW CREATE TABLE and SHOW FULL PROCESSLIST.

CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `article_id` int(11) NOT NULL DEFAULT '0',
  `user_id` int(11) NOT NULL DEFAULT '0',
  `timestamp` datetime DEFAULT '0000-00-00 00:00:00',
  `txt` text,
  `soukrome` tinyint(4) NOT NULL DEFAULT '0',
  `vazna_kritika` tinyint(4) NOT NULL DEFAULT '0',
  `kos` tinyint(4) NOT NULL DEFAULT '0',
  `done` tinyint(4) NOT NULL DEFAULT '0',
  `autor_id` int(11) NOT NULL DEFAULT '0',
  `sbirka_id` int(11) NOT NULL DEFAULT '0',
  `klub_id` int(11) NOT NULL DEFAULT '0',
  `klub_id_procleny` int(11) NOT NULL DEFAULT '0',
  `notinlist` tinyint(1) NOT NULL DEFAULT '0',
  `admin_id` int(11) NOT NULL DEFAULT '0',
  `kritika_redakce` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `timestamp` (`timestamp`),
  KEY `user_id` (`user_id`),
  KEY `autor_id` (`autor_id`),
  KEY `sbirka_id` (`sbirka_id`),
  KEY `klub_id` (`klub_id`),
  KEY `article_id` (`article_id`),
  KEY `no` (`kos`),
  KEY `done` (`done`),
  KEY `soukrome` (`soukrome`),
  KEY `klub_id_procleny` (`klub_id_procleny`),
  KEY `notinlist` (`notinlist`)
) ENGINE=MyISAM AUTO_INCREMENT=4470321 DEFAULT CHARSET=utf8

If I remember it correctly, the processlist showed something along these lines (I'm unable to reproduce the issue on the fixed query and I don't remember the problematic article_ids):

Id    User  Host       db  Command  Time   State         Info
74814 root  localhost  p2  Query    62     Sending data  SELECT timestamp FROM...
74815 root  localhost  p2  Query    62     Sleep         

Best Answer

As your table is structured now, there are only two "good" options for satisfying your query, neither one ideal, and one of them isn't likely to be very good at all:

  • find all of the articles matching 'article_id' using the (article_id) index, and then sort their timestamps to find the largest one (the one shown in the question)... or
  • iterate through all the articles starting with the largest timestamp by reading row data in the order specified in the (timestamp) index and working backwards by timestamp until it finds one with the correct article_id... which would vary wildly in performance but could be a sensible strategy in some sort of parallel universe :) or if certain article_id values are particularly common, in which case a strategy like this might actually have been a good choice... but my guess is the optimizer might be incorrectly choosing this one on occasion, since the structures do support it.

The thing is, the execution plan can actually change for a given query depending on the specific argument in the WHERE clause. It could be that you're getting a different plan for particular values -- the question seems to imply that your EXPLAIN output may not have been from one of the problematic rows, so this may not be the plan chosen for the other article_id values.

If I'm correct that you haven't captured the query plan for one of the problematic values, it could be that the subquery makes your intentions somehow more obvious and changes the plan.

Really, though, if this is a common query, you need an index on both columns together.

ALTER TABLE my_table ADD KEY (article_id,timestamp);

This index should always be the obvious choice from the optimizer's perspective, since it knows exactly where to find the largest timestamp for any given article, so this index should eliminate "Using where" and "Using filesort" and result in all such queries being faster, though the difference may be less significant for some queries that are performing acceptably now.