Mysql – Simple SQL query but huge table – how to optimize

MySQL

I've got this very simple MySQL query:

SELECT target FROM table WHERE goal_id=1 AND year>=2015 AND year<=2020

The table has about 5 millions rows, though. And as a result, it's very slow (about 10 seconds).

What could I do to improve things? Would an index help, and if so on which column?

Best Answer

The optimal index for that query is

INDEX(goal_id, year, target)

in that order. And regardless of the cardinality of any of the columns.

Start with the = test(s)
Add on at most one range (the two year tests are effectively one "range")
Add on all the other columns found anywhere in the query; this makes the index "covering" for another boost. Now the entire query is performed in the index's BTree, without also having to bounce over to the data's BTree.

(That is a simplification. For more details and varied cases: http://mysql.rjweb.org/doc.php/index_cookbook_mysql )

But... That query looks like it might return lots and lots of rows. If so, you may be network-bound. If so, then think about whether you can do more processing in SQL without shoveling a million rows to the client.

Related Solutions

Mysql – Optimizing ORDER BY for simple MySQL query

The EXPLAIN SELECT you posted definitely seems counter-intuitive.

If your query included WHERE s.id = ... then the query plan you're seeing might make a little bit more sense, but I'm assuming you're not.

It looks like the optimizer is getting distracted by the facts that supplier is a smaller table and that the supplier_id index in the po table can be used as a covering index... and with those facts in hand, it's overlooking the seemingly-obvious fact that the tables should be read in the opposite order than the one it chooses.

Here are two alternatives.

-- use the STRAIGHT_JOIN directive to insist that the optimizer process the tables in only the listed order:

SELECT STRAIGHT_JOIN * FROM `po` 
INNER JOIN po_suppliers s ON po.supplier_id = s.id
ORDER BY po.id ASC
LIMIT 10;

-- use the FORCE KEY index hint to direct the optimizer to prefer the primary key of the po table:

SELECT * FROM `po` FORCE KEY (PRIMARY) 
INNER JOIN po_suppliers s ON po.supplier_id = s.id
ORDER BY po.id ASC
LIMIT 10;

The first option is probably the better option, since FORCE KEY, in spite of the name, is still only a "hint" that the optimizer can choose to ignore, while STRAIGHT_JOIN genuinely does force the hand of the optimizer to join the tables in the order they're listed.

Mysql – How to speed up this slow simple query

Try adding two indexes:

ALTER TABLE game
  ADD INDEX deleted_sportid_hometeam_date_IX          -- choose names
    (deleted, sportid, hometeam, date), 
  ADD INDEX deleted_sportid_awayteam_date_IX          -- for the indexes
    (deleted, sportid, awayteam, date)  ;

and then running this version:

SELECT * FROM game 
WHERE hometeam = 29
  AND sportid = 1 
  AND date BETWEEN '2012-07-01' AND '2013-06-30'
  AND deleted = 0

UNION ALL

SELECT * FROM game
WHERE awayteam = 29 
  AND sportid = 1 
  AND date BETWEEN '2012-07-01' AND '2013-06-30'
  AND deleted = 0 ;

You can still try your original query after having added the 2 indexes and the efficiency will probably be improved (but not much, depends on the selectivity of the deleted - sportid combination.) But I don't think the optimizer is smart enough to understand that this (rewriting) is equivalent to that (original query) and that it could use both indexes, one for the 1st and the other for the 2nd part, so you will only have a small efficiency gain.

So, the "why is it faster this way" has mainly to do with the OR condition and with the limited ability of the optimizer to rewrite queries in equivalent forms. This rewriting for example would never be produced anyway by an automated optimizer (because I assumed that hometeam and awayteam are never the same, which the optimizer cannot guess.) If home team and away team could be the same, then we'd have to use UNION (not UNION ALL).

B-tree indexes are super-good for conditions with AND, only. When there are ORs or otherwise complicated conditions, they are not so good. Occasionally - like in this case - there is a way to rewrite a query so it has only ANDs. The last part of the execution (the UNION ALL) can be executed in MySQL by running the two subqueries one after the other and showing all results from both so there is no calculations or delay there.

Best Answer

Related Solutions

Mysql – Optimizing ORDER BY for simple MySQL query

Mysql – How to speed up this slow simple query

Related Question