MySQL 5.1 -> 5.6 problem, select distinct order by limit query stops using index

indexMySQL

I'm testing a MySQL 5.1 -> 5.6 upgrade, and I've run into an issue.

I have a query that hits a ~30 million row table, selecting distinct IDs with an order by/limit restriction. Sample query is below:

select distinct MyTable.id 
from MyTable 
where related_id in (22205, 23805) 
order by name asc, id asc 
limit 50;

There are quite a few fields in the table, but these are the only ones being queried upon. There is an index on related_id, as well as a compound index on (related_id, name).

In MySQL 5.1, it used the related_id index and returned within a second or two. In MySQL 5.6, it refuses to use any index and takes several minutes. I've tried forcing index usage as well, explain still shows a full table scan regardless of which index I try to force. I've analyzed the table as well, no difference.

If I remove the IN statement and just use one related_id, it uses an index without issue. The initial query had a dozen or so related_ids, dropping below the eq_range_index_dive_limit parameter (10) made no difference, nor did disabling the dive limit.

So…any idea what's going on here? Did I miss another new parameter in 5.6? Or do we think that it believes it will hit the limit with the table scan faster than trying to use an index? But if that were the case then I'd think I'd still be able to force index usage.

Any help is appreciated.

Edit – Additional info I forgot to include:

If I remove any one of the three "trouble" clauses (DISTINCT, ORDER BY, or LIMIT) then it uses the index without issue. It seems to be the combination of those three, mixed with the IN statement in the WHERE, that causes the problem.

Edit 2 – Original hibernate generated query (minus table/column renames):

select distinct this_.id as y0_ 
from MyTable this_ 
where this_.approved=1 
  and this_.related_id in (22205, 23805, 42987, 43750, 43868, 50773, 50774, 50776, 51996, 53719, 58947) 
order by this_.name asc, 
         this_.id asc 
limit 50;

EDIT 3 – Info requested from comments

CREATE TABLE `MyTable` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `description` varchar(180) COLLATE utf8_unicode_ci DEFAULT NULL,
  `checkedDate` datetime DEFAULT NULL,
  `name` varchar(180) COLLATE utf8_unicode_ci NOT NULL,
  `title` varchar(180) COLLATE utf8_unicode_ci DEFAULT NULL,
  `updatedDate` datetime DEFAULT NULL,
  `payment_id` bigint(20) DEFAULT NULL,
  `source_id` bigint(20) NOT NULL,
  `import_id` bigint(20) DEFAULT NULL,
  `related_id` bigint(20) NOT NULL,
  `type_id` bigint(20) DEFAULT NULL,
  `approved` bit(1) NOT NULL DEFAULT b'1',
  PRIMARY KEY (`id`),
  KEY `MyTable_1_idx` (`source_id`),
  KEY `MyTable_2_idx` (`name`),
  KEY `MyTable_3_idx` (`title`),
  KEY `MyTable_4_idx` (`description`),
  KEY `MyTable_5_idx` (`related_id`),
  KEY `MyTable_6_idx` (`import_id`),
  KEY `MyTable_7_idx` (`payment_id`),
  KEY `MyTable_8_idx` (`type_id`),
  KEY `MyTable_9_idx` (`related_id`,`name`),
  KEY `MyTable_10_idx` (`updatedDate`),
  KEY `MyTable_11_idx` (`approved`,`related_id`),
  KEY `MyTable_12_idx` (`related_id`,`approved`),
  KEY `MyTable_13_idx` (`related_id`,`checkedDate`)
) ENGINE=InnoDB AUTO_INCREMENT=49699672 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

+----------+
| count(*) |
+----------+
| 29316578 |
+----------+
1 row in set (7.40 sec)

+----------+
| count(*) |
+----------+
|     5403 |
+----------+
1 row in set (0.01 sec)

+----------+
| count(*) |
+----------+
|     2492 |
+----------+
1 row in set (0.00 sec)

+----+-------------+---------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------+------+----------+------------------------------------------+
| id | select_type | table   | type  | possible_keys                                                                                                                                                                                     | key           | key_len | ref  | rows     | Extra                                    |
+----+-------------+---------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------+------+----------+------------------------------------------+
|  1 | SIMPLE      | MyTable | index | PRIMARY,MyTable_1_idx,MyTable_2_idx,MyTable_3_idx,MyTable_4_idx,MyTable_5_idx,MyTable_6_idx,MyTable_7_idx,MyTable_8_idx,MyTable_9_idx,MyTable_10_idx,MyTable_11_idx,MyTable_12_idx,MyTable_13_idx | MyTable_9_idx | 550     | NULL | 28960887 | Using where; Using index; Using filesort |
+----+-------------+---------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------+------+----------+------------------------------------------+
1 row in set (0.00 sec)

I updated the table with ALTER TABLE MyTable FORCE last night to update the temporal columns for 5.6. I'm not sure if it was that, or updated statistics or whatnot, but now it IS using an index, but it looks like it's only using the name portion to order, not filtering on related_id. It is still slow enough that I cancelled it after a minute or so.

I also added the FORCE INDEX query/output below on the index that worked in 5.1, it refuses to use it:

explain extended 
select distinct this_.id as y0_ 
from MyTable this_ FORCE INDEX(MyTable_5_idx) 
where this_.related_id in (22205, 23805, 42987, 43750, 43868, 50773, 50774, 50776) 
order by this_.name asc, this_.id asc 
limit 50;
+----+-------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | type | possible_keys                                                                                                                                                                                     | key  | key_len | ref  | rows     | filtered | Extra                       |
+----+-------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------+------+----------+----------+-----------------------------+
|  1 | SIMPLE      | this_ | ALL  | PRIMARY,MyTable_1_idx,MyTable_2_idx,MyTable_3_idx,MyTable_4_idx,MyTable_5_idx,MyTable_6_idx,MyTable_7_idx,MyTable_8_idx,MyTable_9_idx,MyTable_10_idx,MyTable_11_idx,MyTable_12_idx,MyTable_13_idx | NULL | NULL    | NULL | 28960887 |     0.03 | Using where; Using filesort |
+----+-------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------+------+----------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

After adding the following index however, I get the following plan, and almost instant execution:

alter table MyTable 
add index MyTable_14_idx (name, id, related_id), 
LOCK=None;

+----+-------------+-------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+---------+------+----------+----------+--------------------------+
| id | select_type | table | type  | possible_keys                                                                                                                                                                                                    | key            | key_len | ref  | rows     | filtered | Extra                    |
+----+-------------+-------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+---------+------+----------+----------+--------------------------+
|  1 | SIMPLE      | this_ | index | PRIMARY,MyTable_1_idx,MyTable_2_idx,MyTable_3_idx,MyTable_4_idx,MyTable_5_idx,MyTable_6_idx,MyTable_7_idx,MyTable_8_idx,MyTable_9_idx,MyTable_10_idx,MyTable_11_idx,MyTable_12_idx,MyTable_13_idx,MyTable_14_idx | MyTable_14_idx | 558     | NULL | 28960887 |     0.03 | Using where; Using index |
+----+-------------+-------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+---------+------+----------+----------+--------------------------+

I'm still not comfortable that this entirely fixes things however. The query is built dynamically from a whole variety of user inputs, and the ORDER BY column itself can and will change, along with the filter criteria. It seems that it should still be capable of using the related_id index to filter the results. Thoughts?

Best Answer

I think I've found a workaround, though I'm going to file a bug report unless someone can explain why this works.

The problem appears to originate with the id portion of the order by clause. When using either

order by name asc,
         name asc,
         id asc

or

order by name asc,
         id asc,
         id asc

then the query uses the existing indices on related_id and the results come back instantly. I had another similar query, and it was solved by changing this

order by id asc

to this

order by id asc,
         id asc

This makes absolutely no sense to me, but it works, and it all seems to be focused on the presence of order by id.