MySQL: Does access to table data not in an index via a join causes a file-sort when ordered

MySQL

The inclusion of category.name in the SELECT, causes the query to perform a
Using index; Using temporary; Using filesort, and also the rows scanned are not affected by the LIMIT 0,1.

CREATE TABLE `item` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `category` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `created_at` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `feature` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `item_id` INT(10) UNSIGNED DEFAULT NULL,
  `category_id` INT(10) UNSIGNED DEFAULT NULL,
  `start_date` DATE DEFAULT NULL,
  `created_at` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `tbl_feature_id_item_id_foreign` (`item_id`),
  KEY `tbl_feature_id_category_id_foreign` (`category_id`),
  CONSTRAINT `tbl_feature_id_item_id_foreign` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`),
  CONSTRAINT `tbl_feature_id_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO `item` (`id`) VALUES (1),(2),(3),(4),(5);

INSERT INTO `category` (`id`, `name`, `created_at`, `updated_at`)
  VALUES
    (1, 'a', '2016-12-02 22:49:46', NULL),
    (2, 'b', '2016-12-02 22:49:48', NULL)
;

INSERT INTO `feature` (`id`, `item_id`, `category_id`, `start_date`, `created_at`, `updated_at`)
VALUES
(1,  1, 1, '2016-12-01', NOW(), NOW()),
(2,  1, 2, '2016-12-02', NOW(), NOW()),
(3,  2, 1, '2016-12-01', NOW(), NOW()),
(4,  2, 2, '2016-12-02', NOW(), NOW()),
(5,  3, 1, '2016-12-01', NOW(), NOW()),
(6,  3, 2, '2016-12-02', NOW(), NOW()),
(7,  4, 1, '2016-12-01', NOW(), NOW()),
(8,  4, 2, '2016-12-02', NOW(), NOW()),
(9,  5, 1, '2016-12-01', NOW(), NOW()),
(10, 5, 2, '2016-12-02', NOW(), NOW())
;

EXPLAIN EXTENDED
SELECT 
    item.id
  , feature.id
  , category.id
  , category.name -- Is the cause of the temporary table; file-sort;
FROM 
  item 
LEFT JOIN feature ON (
  feature.item_id = item.id 
  AND feature.start_date = (
    SELECT 
      MAX(start_date) AS start_date 
    FROM 
      feature 
    WHERE 
      feature.item_id = item.id 
  )
) 
LEFT JOIN category ON (
  category.id = feature.category_id
)
ORDER BY item.id DESC -- or ASC
LIMIT 0, 1 -- Is ignored in the table scan
;

MySQL 5.7.16

enter image description here

I believe it may be because of the need to access the source table after the sort is done. because the file-sort is performed in Mode 2

http://s.petrunia.net/blog/?p=24

Mode 2: sort pairs and produce a sequence of rowids which one can use to get source table’s rows in the required order (but this will be essentially hit the table in random order and is not very fast)

Adding an index to category.name does resolve the issue. Is this is because category.name is now part of the index?

Can anybody confirm, and possibly provide a better solution than adding the index, as in reality, it does not resolve the issue as the point in time construct is repeated for multiple other features i.e. feature_a, feature_b.

Why is Using index; Using temporary; Using filesort performed?

Why does the LIMIT 0,1 not affect the rows scanned?

Best Answer

The temporary table is caused because you have a dependent subquery in your join to calculate the MAX(startdate).

I was able to improve the EXPLAIN by adding an index:

ALTER TABLE feature 
  ADD INDEX item_startdate_category (item_id, start_date, category_id);

And changing the query to avoid the dependent subquery:

EXPLAIN EXTENDED SELECT i.id, f1.id, c.id   
FROM item i 
LEFT JOIN feature f1 ON (f1.item_id = i.id) 
LEFT JOIN feature f2 ON (f1.item_id=f2.item_id AND f1.start_date < f2.start_date) 
LEFT JOIN category c ON (c.id = f1.category_id ) 
WHERE f2.item_id IS NULL
ORDER BY i.id DESC LIMIT 0, 1;

This join between f1 and f2 is meant to ensure that f1 has the largest start_date for its respective item_id. If the LEFT JOIN to f2 finds no other row with the same item_id and a greater start_date, then f1 must have the greatest start_date.

Here's the EXPLAIN I got:

+----+-------------+-------+------------+--------+--------------------------------------------------------+-------------------------+---------+---------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type   | possible_keys                                          | key                     | key_len | ref                 | rows | filtered | Extra                    |
+----+-------------+-------+------------+--------+--------------------------------------------------------+-------------------------+---------+---------------------+------+----------+--------------------------+
|  1 | SIMPLE      | i     | NULL       | index  | NULL                                                   | PRIMARY                 | 4       | NULL                |    1 |   100.00 | Using index              |
|  1 | SIMPLE      | f1    | NULL       | ref    | tbl_feature_id_item_id_foreign,item_startdate_category | item_startdate_category | 5       | test.i.id           |    2 |   100.00 | Using index              |
|  1 | SIMPLE      | f2    | NULL       | ref    | tbl_feature_id_item_id_foreign,item_startdate_category | item_startdate_category | 5       | test.f1.item_id     |    2 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY                                                | PRIMARY                 | 4       | test.f1.category_id |    1 |   100.00 | Using index              |
+----+-------------+-------+------------+--------+--------------------------------------------------------+-------------------------+---------+---------------------+------+----------+--------------------------+

I tested on MySQL 8.0.0-dmr on my Macbook.