How can I speed up this 2m5s query that has indices?
select urls.id as urlId,
count(case when s1.hit_type = 0 then 1 end) as aCount,
count(case when s1.hit_type = 1 then 1 end) as bCount,
count(case when s1.hit_type = 2 then 1 end) as cCount,
count(distinct s1.source_id) as sourcesCount
from urls join stats s1 on urls.id = s1.url_id
where s1.hit_date >= '2017-12-12'
group by urls.id
order by aCount desc
limit 0,100;
mysql> show create table stats;
| stats | CREATE TABLE `stats` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`url_id` varchar(100) DEFAULT NULL,
`hit_date` datetime DEFAULT NULL,
`hit_type` tinyint(4) DEFAULT NULL,
`source_id` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `url_id_idx` (`url_id`),
KEY `source_id` (`source_id`),
KEY `stats_hit_date_idx` (`hit_date`),
CONSTRAINT `stats_ibfk_1` FOREIGN KEY (`url_id`) REFERENCES `urls` (`ID`),
CONSTRAINT `stats_ibfk_2` FOREIGN KEY (`source_id`) REFERENCES `sources` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6027557 DEFAULT CHARSET=latin1 |
mysql> describe select...
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | s1 | ALL | url_id_idx,stats_hit_date_idx | NULL | NULL | NULL | 5869695 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | urls | eq_ref | PRIMARY,urls_email_idx,urls_status_idx,deptId_idx,deptId_status_email_idx | PRIMARY | 102 | db.s1.url_id | 1 | Using index |
It doesn't seem to be using the hit_date index or url_id index.
I tried using a sub-select (select count(*) from stats where url_id = ... and hit_date >= ... and hit_type = 0) as aCount
and it was faster and took 24s. Is there a way to make it less than 5s? The limit for the entire request is 30s.
MySQL Server version: 5.6.35-log MySQL Community Server (GPL)
Best Answer
Your query is equal to
except in your query output there are only records which "pairs" exists in
urls
table.But the constraint
do not allow those records.
So my query is absolutely equal to your one, and you can use it instead.
To increase this query speed you may create covering index
And the best way is to move
url_id
to a separate table and replace it with a reference of numeric type (grouping by VARCHAR field is expensive).Additionally -
count(case when s1.hit_type = N then 1 end)
can be replaced with shortSUM(s1.hit_type = N)
.To speed up the whole query I'd recommend try to divide it to 4 separate queries:
The index by
(url_id, hit_type, hit_date)
will speed up first 3 subqueries, and by(url_id, hit_date, source_id)
will speed up the last subquery.