Mysql – Query with order by, group by and limit

group bylimitsmysql-5.5order-by

I have two tables: chains and documents.

CREATE TABLE `chains` (
  `chain_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
...
  PRIMARY KEY (`chain_id`),
...
) ENGINE=InnoDB

CREATE TABLE `docs` (
  `doc_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `chain_id` int(10) unsigned NOT NULL,
  `id_status` tinyint(3) unsigned NOT NULL COMMENT 'open,sent,inbox,read,etc',
  `date_created` int(10) unsigned NOT NULL DEFAULT '0',
...
  PRIMARY KEY (`doc_id`),
  KEY `id_status` (`id_status`),
  KEY `family` (`family`),
  KEY `chain_id` (`chain_id`),
  KEY `date_created` (`date_created`),
) ENGINE=InnoDB

One chain contains one or many documents.

I need to find all documents ordered by date_created and grouped by chain_id with limit.

SELECT `chains`.`chain_id`
FROM `chains`
JOIN `docs` USING (`chain_id`)
WHERE `docs`.`family`=1
AND `docs`.`id_status` IN (4,5)
AND `chains`.`is_archive`=0
GROUP BY `chain_id`
ORDER BY `date_created` DESC
LIMIT 0,200

Query time: 36.01 sec

EXPLAIN result:

+----+-------------+--------+------+---------------------------------+------------+---------+--------------------------+---------+----------------------------------------------+
| id | select_type | table  | type | possible_keys                   | key        | key_len | ref                      | rows    | Extra                                        |
+----+-------------+--------+------+---------------------------------+------------+---------+--------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | chains | ref  | PRIMARY,is_archive              | is_archive | 1       | const                    | 2789080 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | docs   | ref  | id_status,family,chain_id,test2 | chain_id   | 4       | for_test.chains.chain_id |       1 | Using where                                  |
+----+-------------+--------+------+---------------------------------+------------+---------+--------------------------+---------+----------------------------------------------+

How to optimize this query, or may be I need to redesign my tables?
May be some index can help me?

Best Answer

There are various ways to rewrite your query, so it uses indexes that cover it. For example:

SELECT d.chain_id
FROM docs AS d
WHERE d.family = 1
  AND d.id_status IN (4,5)
  AND EXISTS
      ( SELECT *
        FROM chains AS c
        WHERE c.is_archive = 0 
          AND c.chain_id = d.chain_id 
      )
GROUP BY d.chain_id
ORDER BY MAX(d.date_created) DESC
LIMIT 200 OFFSET 0 ;

or this:

SELECT d.chain_id
FROM docs AS d
  JOIN ( SELECT c.chain_id
         FROM chains AS c
         WHERE c.is_archive = 0 
       ) AS c 
    ON c.chain_id = d.chain_id 
WHERE d.family = 1
  AND d.id_status IN (4,5)
GROUP BY d.chain_id
ORDER BY MAX(d.date_created) DESC
LIMIT 200 OFFSET 0 ;

In combination with proper indexes that will "cover" the query. I'd start by adding these and testing for efficiency:

docs
(id_status, family, chain_id, date_created)

chains
(id_archive, chain_id)