MySQL – Stuck in EXPLAIN Statement: Troubleshooting Guide

explainMySQL

I am trying to debug the following query.

# Projects with at least 50 members with at least 240 yearly commits each
select project_id, count(*) as active_members from
(
        select project_members.repo_id as project_id, project_members.user_id, count(*)
        from project_members
        inner join yearly_project_commits on project_members.user_id = yearly_project_commits.committer_id and project_members.repo_id = yearly_project_commits.project_id
        group by project_members.repo_id, project_members.user_id
        having count(*) > 240
) as active_member_projects
group by project_id
having count(*) > 50;

Originally, it run for days without producing any result. At the point where I interrupted the query's execution, MySQL was not consuming any significant CPU time and not issuing any system calls (shown by running strace on the mysqld process). At that point the output of show processlist gave the following result.

+----+-----------+-----------+-----------+---------+--------+--------------+------------------------------------------------------------------------------------------------------+
| Id | User      | Host      | db        | Command | Time   | State        | Info                                                                                                 |
+----+-----------+-----------+-----------+---------+--------+--------------+------------------------------------------------------------------------------------------------------+
| 55 | ghtorrent | localhost | ghtorrent | Query   | 217907 | Sending data | select project_id, count(*) as active_members from
(
        select project_members.repo_id as project_id,  |
| 69 | ghtorrent | localhost | ghtorrent | Query   |      0 | NULL         | show processlist                                                                                     |
+----+-----------+-----------+-----------+---------+--------+--------------+------------------------------------------------------------------------------------------------------+

I also tried running EXPLAIN on the query, but this also gets stuck, with show processlist showing the following.

+----+-----------+-----------+-----------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+
| Id | User      | Host      | db        | Command | Time | State                        | Info                                                                                                 |
+----+-----------+-----------+-----------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+
| 10 | ghtorrent | localhost | ghtorrent | Query   |  564 | Copying to tmp table on disk | explain select populous_projects.name, members, count(populous_projects.project_id) as yearly_projec |
| 40 | ghtorrent | localhost | ghtorrent | Query   |    1 | NULL                         | show processlist                                                                                     |
+----+-----------+-----------+-----------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+

Running EXPLAIN on the inner query gives me

+----+-------------+------------------------+-------+-------------------------+------------+---------+-----------------------------------+-------+-------------+
| id | select_type | table                  | type  | possible_keys           | key        | key_len | ref                               | rows  | Extra       |
+----+-------------+------------------------+-------+-------------------------+------------+---------+-----------------------------------+-------+-------------+
|  1 | SIMPLE      | project_members        | index | PRIMARY,user_id         | PRIMARY    | 8       | NULL                              | 53530 | Using index |
|  1 | SIMPLE      | yearly_project_commits | ref   | committer_id,project_id | project_id | 4       | ghtorrent.project_members.repo_id |   113 | Using where |
+----+-------------+------------------------+-------+-------------------------+------------+---------+-----------------------------------+-------+-------------+

and I see that indices are defined for the fields used.

mysql> show indexes from yearly_project_commits;
+------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                  | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| yearly_project_commits |          1 | committer_id |            1 | committer_id | A         |     2535097 |     NULL | NULL   | YES  | BTREE      |         |               |
| yearly_project_commits |          1 | project_id   |            1 | project_id   | A         |     7134168 |     NULL | NULL   |      | BTREE      |         |               |
+------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

mysql> show indexes from project_members;
+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| project_members |          0 | PRIMARY  |            1 | repo_id     | A         |     5559879 |     NULL | NULL   |      | BTREE      |         |               |
| project_members |          0 | PRIMARY  |            2 | user_id     | A         |     5559879 |     NULL | NULL   |      | BTREE      |         |               |
| project_members |          1 | user_id  |            1 | user_id     | A         |     5559879 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The corresponding create table commands are as follows.

+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| project_members | CREATE TABLE `project_members` (
  `repo_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `ext_ref_id` varchar(24) NOT NULL DEFAULT '0',
  PRIMARY KEY (`repo_id`,`user_id`),
  KEY `user_id` (`user_id`),
  CONSTRAINT `project_members_ibfk_1` FOREIGN KEY (`repo_id`) REFERENCES `projects` (`id`),
  CONSTRAINT `project_members_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
| yearly_project_commits | CREATE TABLE `yearly_project_commits` (
  `id` int(11) NOT NULL DEFAULT '0',
  `author_id` int(11) DEFAULT NULL,
  `committer_id` int(11) DEFAULT NULL,
  `project_id` int(11) NOT NULL DEFAULT '0',
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  KEY `committer_id` (`committer_id`),
  KEY `project_id` (`project_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The output of show full processlist at the time EXPLAIN is running is the following.

+----+-----------+-----------+-----------+---------+-------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id | User      | Host      | db        | Command | Time  | State        | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+----+-----------+-----------+-----------+---------+-------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 43 | ghtorrent | localhost | ghtorrent | Query   | 53434 | Sending data | explain
    select project_id, count(*) as active_members from
    (
            select project_members.repo_id as project_id, project_members.user_id, count(*)
            from project_members
            inner join yearly_project_commits on project_members.user_id = yearly_project_commits.committer_id and project_members.repo_id = yearly_project_commits.project_id
            group by project_members.repo_id, project_members.user_id
            having count(*) > 240
    ) as active_member_projects
    group by project_id
    having count(*) > 50 |
| 47 | ghtorrent | localhost | ghtorrent | Query   |     0 | NULL         | SHOW FULL PROCESSLIST                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+----+-----------+-----------+-----------+---------+-------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

These things happen on an otherwise idle machine running MySQL Server version 5.5.43-0+deb8u1 (Debian) under Debian GNU/Linux 8.0 (jessie). The machine has 24GB of RAM and configured with innodb_buffer_pool_size=1GB.

All tables apart from yearly_project_commits, which uses MyISAM (it is a read-only table created to simplify a more complex query) use InnoDB.

Best Answer

For some queries EXPLAIN tries to run some subqueries to get the stats - it is a known bug which is fixed in 5.6 and in MariaDB (10 at least, not sure about 5.5).

Your indexes are not good enough - you have two single-column indexes on yearly_project_commits - but the query can use only one at a time.

You should create one multicolumn index with both the columns (project_id, committer_id) - that should turn the subquery into index-only scan with direct access in join and make it substantially faster.