I have a fairly simple query that I've failed to optimize sufficiently despite adding a bunch of indexes.
The query is:
SELECT max(elapsed_seconds)
FROM sql_queries
WHERE created_at >= now() - interval 1 week
GROUP BY `sql`
I've tried adding the following indexes:
KEY `sql_queries_sql_index` (`sql`),
KEY `sql_queries_elapsed_seconds_index` (`elapsed_seconds`),
KEY `sql_queries_created_at_index` (`created_at`),
KEY `sql_queries_sql_created_at_index` (`sql`,`created_at`),
KEY `sql_queries_sql_elapsed_seconds_index` (`sql`,`elapsed_seconds`),
KEY `sql_queries_created_at_sql_elapsed_seconds` (`created_at`,`sql`,`elapsed_seconds`)
Obviously there's too many (and redundant) indexes — I just kept adding them hoping the query would run faster.
The table has 24 million rows and the query currently takes about four minutes.
"explain" shows:
+----+-------------+-------------+------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------+----------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------+----------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | sql_queries | NULL | range | sql_queries_sql_index,sql_queries_created_at_index,sql_queries_sql_created_at_index,sql_queries_sql_elapsed_seconds_index,sql_queries_created_at_sql_elapsed_seconds | sql_queries_created_at_sql_elapsed_seconds | 5 | NULL | 11574092 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------+------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------+----------+----------+-----------------------------------------------------------+
The column definitions are:
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`sql` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`elapsed_seconds` double DEFAULT NULL,
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
I'd like to do avg and count on each group, but I've omitted that to simplify discussions.
Any ideas / tips are greatly appreciated.
—
Update 1: I tried simplifying the query by hard-coding the date like this:
select max(elapsed_seconds) from sql_queries where created_at >= '2018-4-22' group by `sql`;
The query time (after doing a first query to warm up the caches) decreases from 2 min 5 sec to 1 min 53 secs. So not a significant improvement.
Update 2:
Here's the explain statements for this simplified query:
mysql> explain select max(elapsed_seconds) from sql_queries where created_at >= '2018-4-22' group by `sql`;
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------------------------------------+--------------------------------------------+---------+------+----------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------------------------------------+--------------------------------------------+---------+------+----------+----------+--------------------------+
| 1 | SIMPLE | sql_queries | NULL | index | sql_queries_sql_index,sql_queries_created_at_index,sql_queries_sql_created_at_elapsed_seconds | sql_queries_sql_created_at_elapsed_seconds | 780 | NULL | 29773986 | 50.00 | Using where; Using index |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------------------------------------+--------------------------------------------+---------+------+----------+----------+--------------------------+
Update 3:
As a sanity check I tried removing the date constraint and added an index on (sql, elapsed_seconds). The query was then instantaneous.
Best Answer
There are only three serious candidates:
Both are "covering". That is, the query can be handled entirely in the index.
EXPLAIN
indicates such by sayingUsing index
.Analysis:
filter first. But then the rest of the index is not in any useful order. So it sorts to do the
GROUP BY
and eventually finds the max. It cannot simply reach for the 'last' entry to getMAX
. I don't think either of these is better than the other of the two.might avoid the sort, since the
sql
values come one at a time. Also, the Optimizer might be able to jump to the starting point in the index for the desiredcreated_at
(for eachsql
). Again, it cannot simply reach for the 'last' entry to getMAX
.I vote for #3. However, this is an area where there have been optimization improvements. That is, an older version of MySQL may not do, for example, the leapfrogging.