Mysql – How to speed up this slow simple query

MySQL

SELECT * FROM `game` 
WHERE (hometeam=29 OR awayteam=29) 
  AND `sportid`=1 
  AND`date` BETWEEN '2012-07-01' AND '2013-06-30'
  AND deleted=0

This simple query takes 0.5-1.1 seconds (without caching). The table holes 1.6M rows, so I think it should be much faster (<0.1 second). In the explain below, it shows it's searching through half of the table. You can see I've tried various index variations with no luck, it always wants to use the sportid index.

id | select_type | table | type | possible_keys                                                                            | key     | key_len | ref   | rows   | Extra
1  | SIMPLE      | game  | ref  | date,hometeam,awayteam,sportid,deleted,sportid_2,sportid_3,sportid_4,sportid_5,sportid_6 | sportid | 1       | const | 800368 | Using where

SHOW CREATE TABLE game

CREATE TABLE `game` (
 `gameid` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `sportid` tinyint(1) unsigned NOT NULL DEFAULT '1',
 `hometeam` smallint(5) unsigned NOT NULL,
 `awayteam` smallint(5) unsigned NOT NULL,
 `homescore` tinyint(4) unsigned NOT NULL,
 `awayscore` tinyint(4) unsigned NOT NULL,
 `stadiumid` smallint(5) unsigned NOT NULL,
 `neutral` tinyint(1) unsigned NOT NULL DEFAULT '0',
 `deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
 `date` date NOT NULL DEFAULT '0000-00-00',
 `time` time NOT NULL DEFAULT '00:00:00',
 `confid` smallint(5) unsigned DEFAULT NULL,
 `srs-predict` tinyint(3) unsigned NOT NULL,
 `srs-teamid` smallint(5) unsigned NOT NULL,
 `crs-predict` tinyint(3) unsigned NOT NULL,
 `crs-teamid` smallint(5) unsigned NOT NULL,
 `winner` smallint(5) unsigned DEFAULT NULL,
 `forfeit` tinyint(1) DEFAULT NULL,
 `playoff` tinyint(1) unsigned DEFAULT NULL,
 `H1` tinyint(3) unsigned DEFAULT NULL,
 `H2` tinyint(3) unsigned DEFAULT NULL,
 `H3` tinyint(3) unsigned DEFAULT NULL,
 `H4` tinyint(3) unsigned DEFAULT NULL,
 `H5` tinyint(3) unsigned DEFAULT NULL,
 `H6` tinyint(3) unsigned DEFAULT NULL,
 `H7` tinyint(3) unsigned DEFAULT NULL,
 `H8` tinyint(3) unsigned DEFAULT NULL,
 `A1` tinyint(3) unsigned DEFAULT NULL,
 `A2` tinyint(3) unsigned DEFAULT NULL,
 `A3` tinyint(3) unsigned DEFAULT NULL,
 `A4` tinyint(3) unsigned DEFAULT NULL,
 `A5` tinyint(3) unsigned DEFAULT NULL,
 `A6` tinyint(3) unsigned DEFAULT NULL,
 `A7` tinyint(3) unsigned DEFAULT NULL,
 `A8` tinyint(3) unsigned DEFAULT NULL,
 `reportedby` int(10) unsigned DEFAULT NULL,
 `lastscandate` date DEFAULT NULL,
 PRIMARY KEY (`gameid`),
 KEY `winner` (`winner`),
 KEY `date` (`date`),
 KEY `hometeam` (`hometeam`),
 KEY `awayteam` (`awayteam`),
 KEY `sportid` (`sportid`),
 KEY `lastscandate` (`lastscandate`),
 KEY `deleted` (`deleted`),
 KEY `sportid_2` (`sportid`,`awayteam`,`date`),
 KEY `sportid_3` (`sportid`,`hometeam`,`date`),
 KEY `sportid_4` (`sportid`,`awayteam`,`date`,`deleted`),
 KEY `sportid_5` (`sportid`,`hometeam`,`date`,`deleted`),
 KEY `sportid_6` (`sportid`,`hometeam`,`awayteam`,`date`,`deleted`)
) ENGINE=InnoDB AUTO_INCREMENT=1969140 DEFAULT CHARSET=utf8

Best Answer

Try adding two indexes:

ALTER TABLE game
  ADD INDEX deleted_sportid_hometeam_date_IX          -- choose names
    (deleted, sportid, hometeam, date), 
  ADD INDEX deleted_sportid_awayteam_date_IX          -- for the indexes
    (deleted, sportid, awayteam, date)  ;

and then running this version:

SELECT * FROM game 
WHERE hometeam = 29
  AND sportid = 1 
  AND date BETWEEN '2012-07-01' AND '2013-06-30'
  AND deleted = 0

UNION ALL

SELECT * FROM game
WHERE awayteam = 29 
  AND sportid = 1 
  AND date BETWEEN '2012-07-01' AND '2013-06-30'
  AND deleted = 0 ;

You can still try your original query after having added the 2 indexes and the efficiency will probably be improved (but not much, depends on the selectivity of the deleted - sportid combination.) But I don't think the optimizer is smart enough to understand that this (rewriting) is equivalent to that (original query) and that it could use both indexes, one for the 1st and the other for the 2nd part, so you will only have a small efficiency gain.

So, the "why is it faster this way" has mainly to do with the OR condition and with the limited ability of the optimizer to rewrite queries in equivalent forms. This rewriting for example would never be produced anyway by an automated optimizer (because I assumed that hometeam and awayteam are never the same, which the optimizer cannot guess.) If home team and away team could be the same, then we'd have to use UNION (not UNION ALL).

B-tree indexes are super-good for conditions with AND, only. When there are ORs or otherwise complicated conditions, they are not so good. Occasionally - like in this case - there is a way to rewrite a query so it has only ANDs. The last part of the execution (the UNION ALL) can be executed in MySQL by running the two subqueries one after the other and showing all results from both so there is no calculations or delay there.

Best Answer

Related Solutions

Mysql – High Disk IO, How to mitigate

Mysql – EXPLAIN output suggests that the index is not being used

Related Question