MySQL uses INDEX for optimizing ORDER BY with GROUP BY except when you add a JOIN

indexjoin;mysql-5.7order-by

Question

Why can't Query #2 use the same (car_trims.horsepower_peak) index to optimize the sorting of the rows as Query #1? The only difference between the two queries is the addition of the JOIN in Query #2.

car_trims ~50k rows

PK: (car_trims.id), index on (car_trims.horsepower_peak)

car_makes ~100 rows

PK: (car_makes.id)

Query #1

SELECT car_trims.*
FROM car_trims
GROUP BY car_trims.id
ORDER BY car_trims.horsepower_peak DESC
LIMIT 0, 200

Execution time: .0026 seconds

EXPLAIN:
Query #1 EXPLAIN

Query #2

SELECT car_trims.*
FROM car_trims
STRAIGHT_JOIN car_makes ON car_makes.id = car_trims.make_id
GROUP BY car_trims.id
ORDER BY car_trims.horsepower_peak DESC
LIMIT 0, 200

Execution time: .2533 seconds

EXPLAIN:
enter image description here

UPDATE:

I've been continuing to work on this and I believe the index is not being utilized in Query #2 because of the mixing of GROUP BY and ORDER BY. According to the MySQL Docs,

"In some cases, MySQL cannot use indexes to resolve the ORDER BY …
[for example, when] … the query has different ORDER BY and GROUP BY
expressions."

Query #1 does mix GROUP BY and ORDER BY and so theoretically the index should not be used according to the docs but I believe that may not apply if the GROUP BY is being ignored entirely due to only 1 table being selected from and grouping on the primary key.

Also, my actual original query is not quite as simple as the example provided here. The crucial difference: usage of GROUP_CONCAT in the SELECT requiring the aforementioned GROUP BY in order to prevent grouping on all rows (i.e. getting a 1-row result). The solution to that issue is using a DEPENDENT SUBQUERY, as discussed here: https://stackoverflow.com/questions/7381828/indexing-with-group-by-order-by-and-group-concat

Best Answer

Query 1: Since id is the PRIMARY KEY, it is unique. Hence the GROUP BY id does nothing. Remove it. This may make it run faster.

Query 2 does not use any columns other than id from car_makes. The only thing that the JOIN does is to verify that there is a row in car_makes for the make_id. You probably don't need that check, so get rid of car_makesin that query. That will simplify things. Note that currently there is a "filesort". Withoutcar_makes`, that step will probably go away.

As for "why can't it use the same index" -- The STRAIGHT_JOIN forces it to look at the other table first. This effectively turns the second table into

WHERE     make_id = ...
GROUP BY  id
ORDER BY  horsepower_peak DESC

To optimize such, it must first filter on make_id.