The EXPLAIN SELECT
you posted definitely seems counter-intuitive.
If your query included WHERE s.id = ...
then the query plan you're seeing might make a little bit more sense, but I'm assuming you're not.
It looks like the optimizer is getting distracted by the facts that supplier is a smaller table and that the supplier_id index in the po table can be used as a covering index... and with those facts in hand, it's overlooking the seemingly-obvious fact that the tables should be read in the opposite order than the one it chooses.
Here are two alternatives.
-- use the STRAIGHT_JOIN
directive to insist that the optimizer process the tables in only the listed order:
SELECT STRAIGHT_JOIN * FROM `po`
INNER JOIN po_suppliers s ON po.supplier_id = s.id
ORDER BY po.id ASC
LIMIT 10;
-- use the FORCE KEY
index hint to direct the optimizer to prefer the primary key of the po
table:
SELECT * FROM `po` FORCE KEY (PRIMARY)
INNER JOIN po_suppliers s ON po.supplier_id = s.id
ORDER BY po.id ASC
LIMIT 10;
The first option is probably the better option, since FORCE KEY
, in spite of the name, is still only a "hint" that the optimizer can choose to ignore, while STRAIGHT_JOIN
genuinely does force the hand of the optimizer to join the tables in the order they're listed.
Try adding two indexes:
ALTER TABLE game
ADD INDEX deleted_sportid_hometeam_date_IX -- choose names
(deleted, sportid, hometeam, date),
ADD INDEX deleted_sportid_awayteam_date_IX -- for the indexes
(deleted, sportid, awayteam, date) ;
and then running this version:
SELECT * FROM game
WHERE hometeam = 29
AND sportid = 1
AND date BETWEEN '2012-07-01' AND '2013-06-30'
AND deleted = 0
UNION ALL
SELECT * FROM game
WHERE awayteam = 29
AND sportid = 1
AND date BETWEEN '2012-07-01' AND '2013-06-30'
AND deleted = 0 ;
You can still try your original query after having added the 2 indexes and the efficiency will probably be improved (but not much, depends on the selectivity of the deleted - sportid
combination.) But I don't think the optimizer is smart enough to understand that this (rewriting) is equivalent to that (original query) and that it could use both indexes, one for the 1st and the other for the 2nd part, so you will only have a small efficiency gain.
So, the "why is it faster this way" has mainly to do with the OR
condition and with the limited ability of the optimizer to rewrite queries in equivalent forms. This rewriting for example would never be produced anyway by an automated optimizer (because I assumed that hometeam
and awayteam
are never the same, which the optimizer cannot guess.) If home team and away team could be the same, then we'd have to use UNION
(not UNION ALL
).
B-tree indexes are super-good for conditions with AND
, only. When there are ORs
or otherwise complicated conditions, they are not so good. Occasionally - like in this case - there is a way to rewrite a query so it has only ANDs
. The last part of the execution (the UNION ALL
) can be executed in MySQL by running the two subqueries one after the other and showing all results from both so there is no calculations or delay there.
Best Answer
The optimal index for that query is
in that order. And regardless of the cardinality of any of the columns.
=
test(s)year
tests are effectively one "range")(That is a simplification. For more details and varied cases: http://mysql.rjweb.org/doc.php/index_cookbook_mysql )
But... That query looks like it might return lots and lots of rows. If so, you may be network-bound. If so, then think about whether you can do more processing in SQL without shoveling a million rows to the client.