Mysql – How to speed up this 2m5s query that has indices

MySQLmysql-5.6

How can I speed up this 2m5s query that has indices?

select urls.id as urlId, 
    count(case when s1.hit_type = 0 then 1 end) as aCount, 
    count(case when s1.hit_type = 1 then 1 end) as bCount, 
    count(case when s1.hit_type = 2 then 1 end) as cCount, 
    count(distinct s1.source_id) as sourcesCount 
from urls join stats s1 on urls.id = s1.url_id 
where s1.hit_date >= '2017-12-12' 
group by urls.id 
order by aCount desc 
limit 0,100;

mysql> show create table stats;

| stats | CREATE TABLE `stats` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `url_id` varchar(100) DEFAULT NULL,
  `hit_date` datetime DEFAULT NULL,
  `hit_type` tinyint(4) DEFAULT NULL,
  `source_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `url_id_idx` (`url_id`),
  KEY `source_id` (`source_id`),
  KEY `stats_hit_date_idx` (`hit_date`),
  CONSTRAINT `stats_ibfk_1` FOREIGN KEY (`url_id`) REFERENCES `urls` (`ID`),
  CONSTRAINT `stats_ibfk_2` FOREIGN KEY (`source_id`) REFERENCES `sources` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6027557 DEFAULT CHARSET=latin1 |

mysql> describe select...
| id | select_type | table   | type   | possible_keys                                                                                   | key     | key_len | ref                      | rows    | Extra                                        |
+----+-------------+---------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | s1      | ALL    | url_id_idx,stats_hit_date_idx                                                                   | NULL    | NULL    | NULL                     | 5869695 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | urls    | eq_ref | PRIMARY,urls_email_idx,urls_status_idx,deptId_idx,deptId_status_email_idx                       | PRIMARY | 102     | db.s1.url_id             |     1   | Using index                                  |

It doesn't seem to be using the hit_date index or url_id index.

I tried using a sub-select (select count(*) from stats where url_id = ... and hit_date >= ... and hit_type = 0) as aCount and it was faster and took 24s. Is there a way to make it less than 5s? The limit for the entire request is 30s.

MySQL Server version: 5.6.35-log MySQL Community Server (GPL)

Best Answer

Your query is equal to

select /* urls.id */ s1.url_id as urlId, 
    count(case when s1.hit_type = 0 then 1 end) as aCount, 
    count(case when s1.hit_type = 1 then 1 end) as bCount, 
    count(case when s1.hit_type = 2 then 1 end) as cCount, 
    count(distinct s1.source_id) as sourcesCount 
from /* urls join */ stats s1 /* on urls.id = s1.url_id */
where s1.hit_date >= '2017-12-12' 
group by /* urls.id */ s1.url_id
order by aCount desc 
limit 0,100;

except in your query output there are only records which "pairs" exists in urls table.

But the constraint

CONSTRAINT `stats_ibfk_1` FOREIGN KEY (`url_id`) REFERENCES `urls` (`ID`)

do not allow those records.

So my query is absolutely equal to your one, and you can use it instead.

To increase this query speed you may create covering index

ALTER TABLE stats ADD INDEX idx (url_id, hit_date, hit_type, source_id)

And the best way is to move url_id to a separate table and replace it with a reference of numeric type (grouping by VARCHAR field is expensive).


Additionally - count(case when s1.hit_type = N then 1 end) can be replaced with short SUM(s1.hit_type = N).


To speed up the whole query I'd recommend try to divide it to 4 separate queries:

SELECT urlId, 
       MAX(aCount) aCount, 
       MAX(bCount) bCount, 
       MAX(cCount) cCount, 
       MAX(sourcesCount) sourcesCount 
FROM (  select  s1.url_id as urlId, 
                COUNT(*) as aCount, 
                0 as bCount, 
                0 as cCount, 
                0 as sourcesCount 
        from stats s1 
        where s1.hit_date >= '2017-12-12' AND s1.hit_type = 0
        group by s1.url_id
      UNION ALL
        select  s1.url_id, 0, COUNT(*), 0, 0
        from stats s1 
        where s1.hit_date >= '2017-12-12'  AND s1.hit_type = 1
        group by s1.url_id
      UNION ALL
        select  s1.url_id as urlId, 0, 0, COUNT(*), 0
        from stats s1 
        where s1.hit_date >= '2017-12-12'  AND s1.hit_type = 2
        group by s1.url_id
      UNION ALL
        select  s1.url_id as urlId, 0, 0, 0, count(distinct s1.source_id)
        from stats s1 
        where s1.hit_date >= '2017-12-12' 
        group by s1.url_id
    ) x
GROUP BY urlId
order by aCount desc 
limit 0,100;

The index by (url_id, hit_type, hit_date) will speed up first 3 subqueries, and by (url_id, hit_date, source_id) will speed up the last subquery.