Mysql – Simple query is slow on 4M-rows table

MySQLoptimizationperformancequery-performance

I have in production a MySQL table pageviews with 4M rows that records page views of users on posts. I need to know which posts a specific user have read, but this request takes up to 15 seconds to execute:

SELECT post_id
FROM pageviews
WHERE user_id = 981
GROUP BY post_id

Here is the execution plan:

mysql> EXPLAIN SELECT post_id FROM visits WHERE user_id = 981 GROUP BY post_id;
+----+-------------+--------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref   | rows  | Extra                                        |
+----+-------------+--------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
|  1 | SIMPLE      | visits | ref  | user_id       | user_id | 5       | const | 54696 | Using where; Using temporary; Using filesort |
+----+-------------+--------+------+---------------+---------+---------+-------+-------+----------------------------------------------+

I'm not sure how to look for the cause of the slowness: maybe the table is not well configured, the mysql server not well tuned, other queries locking stuff, … Or maybe just 4M rows is a good size to start partitioning.

Production database is on Amazon RDS

CREATE TABLE `pageviews` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `post_id` int(11) DEFAULT NULL,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `post_id` (`post_id`),
  KEY `user_id` (`user_id`),
  KEY `created_at` (`created_at`),
  CONSTRAINT `FK_444839EAA76ED395` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`),
  CONSTRAINT `visits_ibfk_2` FOREIGN KEY (`post_id`) REFERENCES `posts` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4587432 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Best Answer

In addition to @tombom's suggestions, creating an index on (user_id, post_id) instead of (or in addition, but the less indexes the better) separate indexes on user_id and post_id will simplify the query, probably getting rid of the filesort and temporary table, plus giving you the benefits of a covering index.

This will probably lower the query execution significantly if you have a large enough buffer pool and the query is relatively frequent.

If after doing that, the query is still slow, you will need to do (pre)caching in order to speed up the query execution.