Mysql – Sorting Result state taking very long time

innodbMySQLsorting

I am using a MySQL 5.1.69-log DB for analysis purpose.

I've designed my database in a star schema. All tables are using InnoDB.
My innodb_buffer_pool_size=2Gb

This is the query I am running :

select
     `dw_dimension_criteria`.`id` as `c0`,
     count(distinct `dw_fact_program_attendance`.`client_id`) as `m0`
from
     `dw_dimension_criteria` as `dw_dimension_criteria`,
     `dw_fact_program_attendance` as `dw_fact_program_attendance`
where
     `dw_fact_program_attendance`.`criteria_id` = `dw_dimension_criteria`.`id`
group by `dw_dimension_criteria`.`id`;

The EXPLAIN gives the following :

+----+-------------+----------------------------+-------+---------------+--------------+---------+-----------------------------------------+--------+-------------+
| id | select_type | table                      | type  | possible_keys | key          | key_len | ref                                     | rows   | Extra       |
+----+-------------+----------------------------+-------+---------------+--------------+---------+-----------------------------------------+--------+-------------+
|  1 | SIMPLE      | dw_dimension_criteria      | index | PRIMARY       | PRIMARY      | 2       | NULL                                    |      1 | Using index | 
|  1 | SIMPLE      | dw_fact_program_attendance | ref   | criteria_id   | criteria _id | 2       | clarity_nevada.dw_dimension_criteria.id | 417669 |             |
+----+-------------+----------------------------+-------+---------------+--------------+---------+-----------------------------------------+--------+-------------+

and the SHOW PROFILE command the following :

+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             |  0.000117 |
| checking permissions |  0.000035 |
| checking permissions |  0.000009 |
| Opening tables       |  0.000022 |
| System lock          |  0.000004 |
| Table lock           |  0.000011 |
| init                 |  0.000032 |
| optimizing           |  0.000012 |
| statistics           |  0.000045 |
| preparing            |  0.000017 |
| executing            |  0.000043 |
| Sorting result       |  0.000005 |
| Sending data         | 12.847928 |
| end                  |  0.000103 |
| removing tmp table   |  0.000016 |
| end                  |  0.000016 |
| query end            |  0.000005 |
| freeing items        |  0.000114 |
| logging slow query   |  0.000003 |
| logging slow query   |  0.000097 |
| cleaning up          |  0.000012 |
+----------------------+-----------+

I do not understand those 12 seconds. I do not understand if it belongs to Sorting Result or Sending Data. I tried adjusting sort_buffer_size but with no flagrant success. The fact table count about 7 millions rows.

Where should I look to drop those 12 seconds to a more reasonable time ?

Best Answer

If there is a foreign key from dw_fact_program_attendance(criteria_id) that REFERENCES dw_dimension_criteria(id), there is no need to join the two tables (but the optimizer is not yet smart enough to understand that.) You can have only one table and GROUP BY criteria_id:

SELECT
     a.criteria_id AS c0,
     COUNT(DISTINCT a.client_id) AS m0
FROM
     dw_fact_program_attendance AS a
GROUP BY
     a.criteria_id ;

An index on (criteria_id, client_id) would help efficiency.