Mysql – Why would splitting up a query make it faster, and can I/should I fix this

MySQL

The following query is for a web-based game. It sets the isTiedForHighScore column to true for all solutions that tied the score for a given "map."

UPDATE solutions
INNER JOIN
(
    SELECT mapID, score
    FROM solutions AS s2
    WHERE isHighScore = 1
) AS maxScore ON maxScore.mapID = solutions.mapID AND solutions.score = maxMoves.score
INNER JOIN maps ON maps.ID = solutions.mapID
SET solutions.isTiedForHighScore = 1
WHERE maps.mapExpireTime >= @mapExpireTime
AND maps.mapExpireTime <= NOW()

(solutions.isHighestScore = 1 will only be true for exactly one solution for each map)

If I set @mapExpireTime to be 1, 2, or 3 months ago, this query runs in around 0.1s. However, if I set it to 4 months ago or longer, it takes about 50s!

Why would this happen (there's no large difference in the size of the dataset over the months)? I imagine it has something to do with all the data not being able to fit in memory – is this usual?

More importantly, is it standard practice to sometimes split a query into chunks (I could, say, send multiple queries, each one working on only one month's worth of maps at a time)? Or is there some way to tell MySQL itself that it's allowed to work on this query in chunks? I really have no idea how to speed up this query any more.


Here is the output of the EXPLAIN (I have to change the UPDATE to a SELECT to get this to output):

id    select_type     table       type    possible_keys       key           key_len  ref                            rows     Extra
1     PRIMARY         <derived2>  ALL     NULL                NULL          NULL     NULL                           1462
1     PRIMARY         solutions   ref     mapIDuserID,mapID   mapID         8        maxScore.mapID,maxScore.moves  3      
1     PRIMARY         maps        ALL     mapExpireTime       NULL          NULL     NULL                           2953     Using where; Using join buffer
2     DERIVED         s2          ref     isHighScore         isHighScore   1                                       1585     Using where; Using temporary; Using filesort

Best Answer

There is no direct way to tell MySQL to work on a single query "in chunks," since SQL is (primarily, at least) a declarative language ("platform, I need you to accomplish this end result") not a procedural one ("platform, I need you to process this data using these steps: for each...").

Memory might be your issue -- which storage engine do these tables use? ...but really, it seems like there are some more likely possibilities:

According to the EXPLAIN SELECT you posted, no index is being used to join the maps table (key is NULL), even though the mapExpireTime index is available. When the optimizer determines that the index doesn't have sufficient selectivity, it won't be used -- and the more records in the maps table your @mapExpireTime would not eliminate, the less likely the optimizer will be to select that index, and instead opt to a full table scan, which may explain why performance falls off. On shorter date ranges, does the index appear in key for the maps table in EXPLAIN? If so, that's the short answer to "why does it slow down?"

It's possible that ANALYZE TABLE maps; might build some better index stats and improve the query performance. Conventional wisdom is that the constant "shiftiness" of InnoDB index stats makes this assertion untrue, but I've seen too many times where the stats on an InnoDB table are in a state that biases the optimizer towards a phenomenally bad query plan and ANALYZE TABLE cleans it right up. (I think the reason the conventional wisdom on this is so widespread is the fact that SHOW TABLE STATUS actually triggers a random index stats dive, so the act of looking makes what you intended to look at different than what it was before you looked, but I digress).

Another possibility would be that you could add an index (ID,mapExpireTime) on the maps table, which seems at first glance quite redundant, but might be used as a covering index... which would be far better than the full table scan and join buffer that's getting used now.