Mysql – Single query to get only newest entry of grouped redundant entries

A table files looks like:

id, file_name, vendor_id, created
1,  z.txt,     2,         2011
2,  z.txt,     2,         2011
3,  a.txt,     1,         2016
4,  a.txt,     1,         2016
5,  a.txt,     1,         2016
6,  b.txt,     1,         2015
7,  y.txt,     2,         2015
8,  x.txt,     2,         2014

There is an index on: file_name, vendor_id and created.

I want a list of the latest files (newest_file) of each vendor_id in a single query.

SELECT vendor_id, SUBSTRING_INDEX(GROUP_CONCAT(file_name ORDER BY created DESC), ',', 1) AS newest_file 
FROM files GROUP BY vendor_id

This query works perfect, but is very slow. Is there a faster alternative in a single query?

CREATE TABLE `files` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created` datetime DEFAULT NULL,
  `file_name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `vendor_id` int(11) NOT NULL DEFAULT '0',
  `description` mediumtext CHARACTER SET utf8,
  `color` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `vendor_id` (`vendor_id`),
  KEY `file_name` (`file_name`),
  KEY `created` (`created`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

SELECT v.vendor_id, f.file_name AS newest_file, f.created -- you can have this in the results as well FROM ( SELECT vendor_id -- you can replace the "v" subquery with FROM files -- a single "vendors" table, GROUP BY vendor_id -- if there is one ) AS v JOIN files AS f ON f.vendor_id = v.vendor_id AND f.created = ( SELECT fi.created FROM files AS fi WHERE fi.vendor_id = v.vendor_id ORDER BY fi.created DESC LIMIT 1 ) ;

Best Answer

The best index for this query would be on (vendor_id, created, file_name), with this specific order on the indexed columns.

There are various other methods to rewrite this type of queries, without using GROUP_CONCAT(). There is even a tag, at SO and this site: greatest-n-per-group.

One way to rewrite (needs the same index as above). It assumes that (vendor_id, created) is UNIQUE (ie. there are no two rows with same vendor and created) or it isn't unique and you want all the duplicate/tied results:

If (vendor_id, created) is not UNIQUE and/or you want just one row of the duplicates, change the joining condition to:

--  the query as it is
-- 
  JOIN files AS f
  ON  f.id =
      ( SELECT fi.id 
        FROM files AS fi
--
--  the query as it is

Best Answer

Related Solutions

Mysql – Unexplained InnoDB timeouts

Mysql – Are two indexes needed

Related Question