The following query is for a web-based game. It sets the isTiedForHighScore
column to true for all solutions that tied the score for a given "map."
UPDATE solutions
INNER JOIN
(
SELECT mapID, score
FROM solutions AS s2
WHERE isHighScore = 1
) AS maxScore ON maxScore.mapID = solutions.mapID AND solutions.score = maxMoves.score
INNER JOIN maps ON maps.ID = solutions.mapID
SET solutions.isTiedForHighScore = 1
WHERE maps.mapExpireTime >= @mapExpireTime
AND maps.mapExpireTime <= NOW()
(solutions.isHighestScore = 1
will only be true for exactly one solution for each map)
If I set @mapExpireTime
to be 1, 2, or 3 months ago, this query runs in around 0.1s. However, if I set it to 4 months ago or longer, it takes about 50s!
Why would this happen (there's no large difference in the size of the dataset over the months)? I imagine it has something to do with all the data not being able to fit in memory – is this usual?
More importantly, is it standard practice to sometimes split a query into chunks (I could, say, send multiple queries, each one working on only one month's worth of maps at a time)? Or is there some way to tell MySQL itself that it's allowed to work on this query in chunks? I really have no idea how to speed up this query any more.
Here is the output of the EXPLAIN
(I have to change the UPDATE
to a SELECT
to get this to output):
id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1462 1 PRIMARY solutions ref mapIDuserID,mapID mapID 8 maxScore.mapID,maxScore.moves 3 1 PRIMARY maps ALL mapExpireTime NULL NULL NULL 2953 Using where; Using join buffer 2 DERIVED s2 ref isHighScore isHighScore 1 1585 Using where; Using temporary; Using filesort
Best Answer
There is no direct way to tell MySQL to work on a single query "in chunks," since SQL is (primarily, at least) a declarative language ("platform, I need you to accomplish this end result") not a procedural one ("platform, I need you to process this data using these steps: for each...").
Memory might be your issue -- which storage engine do these tables use? ...but really, it seems like there are some more likely possibilities:
According to the
EXPLAIN SELECT
you posted, no index is being used to join themaps
table (key
isNULL
), even though themapExpireTime
index is available. When the optimizer determines that the index doesn't have sufficient selectivity, it won't be used -- and the more records in the maps table your @mapExpireTime would not eliminate, the less likely the optimizer will be to select that index, and instead opt to a full table scan, which may explain why performance falls off. On shorter date ranges, does the index appear inkey
for themaps
table inEXPLAIN
? If so, that's the short answer to "why does it slow down?"It's possible that
ANALYZE TABLE maps;
might build some better index stats and improve the query performance. Conventional wisdom is that the constant "shiftiness" of InnoDB index stats makes this assertion untrue, but I've seen too many times where the stats on an InnoDB table are in a state that biases the optimizer towards a phenomenally bad query plan andANALYZE TABLE
cleans it right up. (I think the reason the conventional wisdom on this is so widespread is the fact thatSHOW TABLE STATUS
actually triggers a random index stats dive, so the act of looking makes what you intended to look at different than what it was before you looked, but I digress).Another possibility would be that you could add an index (ID,mapExpireTime) on the maps table, which seems at first glance quite redundant, but might be used as a covering index... which would be far better than the full table scan and join buffer that's getting used now.