A table files
looks like:
id, file_name, vendor_id, created
1, z.txt, 2, 2011
2, z.txt, 2, 2011
3, a.txt, 1, 2016
4, a.txt, 1, 2016
5, a.txt, 1, 2016
6, b.txt, 1, 2015
7, y.txt, 2, 2015
8, x.txt, 2, 2014
There is an index on: file_name
, vendor_id
and created
.
I want a list of the latest files (newest_file
) of each vendor_id
in a single query.
SELECT vendor_id, SUBSTRING_INDEX(GROUP_CONCAT(file_name ORDER BY created DESC), ',', 1) AS newest_file
FROM files GROUP BY vendor_id
This query works perfect, but is very slow. Is there a faster alternative in a single query?
CREATE TABLE `files` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created` datetime DEFAULT NULL,
`file_name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`vendor_id` int(11) NOT NULL DEFAULT '0',
`description` mediumtext CHARACTER SET utf8,
`color` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `vendor_id` (`vendor_id`),
KEY `file_name` (`file_name`),
KEY `created` (`created`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
Best Answer
The best index for this query would be on
(vendor_id, created, file_name)
, with this specific order on the indexed columns.There are various other methods to rewrite this type of queries, without using
GROUP_CONCAT()
. There is even a tag, at SO and this site: greatest-n-per-group.One way to rewrite (needs the same index as above). It assumes that
(vendor_id, created)
isUNIQUE
(ie. there are no two rows with same vendor and created) or it isn't unique and you want all the duplicate/tied results:If
(vendor_id, created)
is notUNIQUE
and/or you want just one row of the duplicates, change the joining condition to: