Mysql – InnoDB – Use combined index with primary key on GROUP BY

indexinnodbMySQL

As far as I know, an InnoDB index always contains the primary index.

I cannot get the following to work:

SELECT t1.a FROM table_1 t1 LEFT JOIN table_2 t2 on t1.id = t2.refid 
WHERE t1.b = 99
GROUP BY t1.id
ORDER BY t1.c

I have these keys:

t1.id = PRIMARY KEY
(b,c) = key

When I EXPLAIN this I get "Using where; Using temporary; Using filesort".

I do use my index on (b,c) but it doesn't use the primary index in that index to get rid of the temp table and filesort.

When I change the GROUP BY to GROUP BY c, I get rid of the temp table and filesort.

Added the primary key field to key (b,c) did not help either.

What am i doing wrong?


Update

Actual query:

SELECT q.id AS q__id, q.user_id AS q__user_id, q.title AS q__title, q. NAME AS q__name
     , COUNT(r.id) AS r__0   
FROM questionnaire q 
  LEFT JOIN response r 
    ON q.id = r.questionnaire_id 
WHERE ( q.user_id = 2 ) 
GROUP BY q.id 
ORDER BY q.updated_at DESC ;

Tables:

CREATE TABLE `questionnaire` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned DEFAULT NULL,
  `title` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
  `name` varchar(18) COLLATE utf8_unicode_ci NOT NULL,
  `description` text COLLATE utf8_unicode_ci,
  `thankyou` text COLLATE utf8_unicode_ci,
  `status` enum('inactive','active') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'inactive',
  `published_at` datetime DEFAULT NULL,
  `license_id` int(10) unsigned DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `Unique_name_idx` (`user_id`,`name`) USING BTREE,
  KEY `questionnaire_license_id_license_id` (`license_id`),
  KEY `deleted` (`deleted_at`),
  KEY `updated_deleted` (`user_id`,`updated_at`) USING BTREE,
  CONSTRAINT `questionnaire_license_id_license_id` 
    FOREIGN KEY (`license_id`) 
    REFERENCES `license` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `response` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `questionnaire_id` int(10) unsigned NOT NULL,
  `ip` varchar(39) DEFAULT NULL,
  `headers` text,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `response_questionnaire_id_questionnaire_id` (`questionnaire_id`),
  CONSTRAINT `response_questionnaire_id_questionnaire_id` 
    FOREIGN KEY (`questionnaire_id`) 
    REFERENCES `questionnaire` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=latin1;

Explain for GROUP BY IdWith Group BY Id
Explain for GROUP BY update_atWith Group By update_at

Best Answer

Because you only select columns from table_1 and because table_2 is only joined (on the right side) of the LEFT JOIN and none of its columns is used elsewhere (SELECT, WHERE, GROUP BY or ORDER BY clauses), you can completely remove the LEFT JOIN table_2 t2 on t1.id = t2.refid part.

This might have given less rows in the result but because you group by the Primary key of table_1 after the join, there is no such case.

SELECT t1.a 
FROM table_1 t1 
WHERE t1.b = 99
GROUP BY t1.id
ORDER BY t1.c ;

Now, because you group by the Primary key of table_1, which is redundant when you have only one table, you can also remove that part: GROUP BY t1.id

Finally, the query is equivalent to:

SELECT t1.a 
FROM table_1 t1 
WHERE t1.b = 99
ORDER BY t1.c ;

which should use the index on (b, c), which yes, includes the primary key. You may consider it to be (b, c, id) if you want.

But whether that index will be used is not 100% sure. Depending on the selectivity of column b (what percent of the whole table has b=99?) this index may or may not be used. If there a lot of rows with b=99 (a large percentage), the optimizer may choose to scan the whole table instead and do a filesort, than use the index, select those (99,c) combinations that exist and then hit the table to find the a values.

If you have an index on (b, c, a) or on (b, c, id, a), the query will be able to find all the info that is needed in this index and in the correct order, so it will use it.