Mysql – Single query to get only newest entry of grouped redundant entries

greatest-n-per-groupMySQLoptimization

A table files looks like:

id, file_name, vendor_id, created
1,  z.txt,     2,         2011
2,  z.txt,     2,         2011
3,  a.txt,     1,         2016
4,  a.txt,     1,         2016
5,  a.txt,     1,         2016
6,  b.txt,     1,         2015
7,  y.txt,     2,         2015
8,  x.txt,     2,         2014

There is an index on: file_name, vendor_id and created.

I want a list of the latest files (newest_file) of each vendor_id in a single query.

SELECT vendor_id, SUBSTRING_INDEX(GROUP_CONCAT(file_name ORDER BY created DESC), ',', 1) AS newest_file 
FROM files GROUP BY vendor_id

This query works perfect, but is very slow. Is there a faster alternative in a single query?


CREATE TABLE `files` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created` datetime DEFAULT NULL,
  `file_name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `vendor_id` int(11) NOT NULL DEFAULT '0',
  `description` mediumtext CHARACTER SET utf8,
  `color` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `vendor_id` (`vendor_id`),
  KEY `file_name` (`file_name`),
  KEY `created` (`created`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

Best Answer

The best index for this query would be on (vendor_id, created, file_name), with this specific order on the indexed columns.

There are various other methods to rewrite this type of queries, without using GROUP_CONCAT(). There is even a tag, at SO and this site: .

One way to rewrite (needs the same index as above). It assumes that (vendor_id, created) is UNIQUE (ie. there are no two rows with same vendor and created) or it isn't unique and you want all the duplicate/tied results:

SELECT 
    v.vendor_id,
    f.file_name AS newest_file,
    f.created                    -- you can have this in the results as well
FROM 
    ( SELECT vendor_id           -- you can replace the "v" subquery with  
      FROM files                 -- a single "vendors" table,
      GROUP BY vendor_id         -- if there is one
    ) AS v
  JOIN files AS f
  ON  f.vendor_id = v.vendor_id
  AND f.created =
      ( SELECT fi.created
        FROM files AS fi
        WHERE fi.vendor_id = v.vendor_id
        ORDER BY fi.created DESC
        LIMIT 1
      ) ;

If (vendor_id, created) is not UNIQUE and/or you want just one row of the duplicates, change the joining condition to:

--  the query as it is
-- 
  JOIN files AS f
  ON  f.id =
      ( SELECT fi.id 
        FROM files AS fi
--
--  the query as it is