First, look at the EXPLAIN plan again.
The first line says that the Query Optimizer will do the following:
- Full table scan of at least 7,706,280 rows without using any keys (thus the root cause of the
Sending data
query state)
- Each row in passes through from
esp_game
will do a lookup to esp_gameplayer
and gameTypeMap
I would like you to try the following:
Refactor Your Query
Your original query is:
SELECT
g.gameID, g.gameType, g.mapID,
g.createDate, gt.gameTypePortugueseDesc
FROM
esp_game g
INNER JOIN esp_gameplayer gp ON g.gameID = gp.gameID and gp.summonerID=401129
INNER JOIN gameTypeMap gt ON g.gameType = gt.gameTypeID
ORDER BY g.createDate DESC LIMIT 10;
If the summonerID drives which gameIDs to retrieve, then the query needs to have esp_gameplayer
be the lead table in the query
SELECT
g.gameID, g.gameType, g.mapID,
g.createDate, gt.gameTypePortugueseDesc
FROM
(SELECT gameID FROM esp_gameplayer WHERE summonerID=401129) gp
INNER JOIN esp_game g ON g.gameID = gp.gameID
INNER JOIN gameTypeMap gt ON g.gameType = gt.gameTypeID
ORDER BY g.createDate DESC LIMIT 10;
Change the esp_gameplayer PRIMARY KEY
CREATE TABLE IF NOT EXISTS `esp_gameplayer` (
`gameID` int(11) unsigned NOT NULL,
`summonerID` int(11) unsigned NOT NULL,
`championID` int(11) NOT NULL,
`teamID` int(11) NOT NULL,
`isUpdated` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`summonerID`,`gameID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
To reverse the PRIMARY KEY do the following:
CREATE TABLE IF NOT EXISTS `esp_gameplayer_new` (
`gameID` int(11) unsigned NOT NULL,
`summonerID` int(11) unsigned NOT NULL,
`championID` int(11) NOT NULL,
`teamID` int(11) NOT NULL,
`isUpdated` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`summonerID`,`gameID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO esp_gameplayer_new SELECT * FROM esp_gameplayer;
ALTER TABLE esp_gameplayer RENAME esp_gameplayer_old;
ALTER TABLE esp_gameplayer_new RENAME esp_gameplayer;
Since you asked for a specific summonerID, reversing the PRIMARY KEY gathers all the corresponding gameIDs faster for summoner 401129
Make the changes, rerun the EXPLAIN plan. It should improve a great deal.
Give it a Try !!!
I bear bad news. I've studied your tables and your query and looked at many variations on them, ran them against a synthetic data set approximately the same size as yours. And I'm sorry to report that I'm 98% sure that your query is nearly optimal and that you already have the right indexes in place. I welcome anyone who can disprove this and come up with additional indexes + rewritten sql that can significantly improve performance.
Aggregation != Searching.
You're doing OLAP. Your query seems to be doing aggregation over about 5 million rows. Normally if I see a query doing a full table scan or a wide index range scan, I would suspect that indexes might not be optimal unless (big unless here) you're doing an aggregation (such as COUNT
, SUM
, AVG
, etc), then there's no way around accessing all those rows since you need to compute on them.
I don't believe you are going to get significantly quicker results simply by adding indexes and rewriting the SQL alone. You'll need to incorporate some other approaches:
Pre-aggregating data. You could have some background job that runs and pre-aggregates your data by time intervals. This could be by 1-day intervals, 1-hour, 5-minutes, 1-minute ... whatever makes sense for your application in terms of how much latency you can tolerate.
Partitioning your data could help. I was thinking you might get some improvement by hash partitioning test_log
over HASH(shorturl)
in 32 hash partitions. But you'd need to redefine your PK and then maintaining the lifecycle of the data could become more difficult (not sure whether you're purging data already, but you probably should, or at least archive it).
Bigger picture, it looks like you're using a RDBMS (MySQL database) to do real-time keyword/click analysis. However, I don't believe an RDBMS is ideally suited for this task. Big data and stream processing are really becoming hot on this exact problem. The following search might get you in the right direction: http://www.google.com/search?q=stream+computing+real-time+click+analysis
Best Answer
This is very simply a case where your server does not have sufficient capacity (often disk I/O, but sometimes memory or CPU) to handle the workload. Poorly-written queries and insufficient or sub-optimal indexes are a common contributor, as are applications that give up too soon and re-send the same query that's already running, and an improperly sized
innodb_buffer_pool_size
with InnoDB.Sending Data
was a very poorly-chosen name for the thread state. It simply does not mean that the server is necessarily "sending" anything. The server may, in fact, still be doing table scans and may have nothing, yet, to send anywhere, and the thread will still be in this state. A more appropriate name for the state might have been "preparing data to send."By contrast, when the server is in fact actually sending data to the client, the state is
Writing to net
and it's pretty rare to actually see that in the processlist unless the resultset is particularly large or the client is particularly slow about accepting data from the network.