Mysql – Is it possible to reduce a group to a row based on some criteria

aggregategroup byMySQL

I'd like to perform a select on a table involving a GROUP BY such that all rows that share the same set of identifiers are grouped together, but I want to reduce the group to one of the grouped rows based on some criteria. For example, the maximum date_added. However, there are other fields of data that could be different among the grouped rows. I want all of those columns to resolve to the row with the max date_added as well.

I realize to get the max date_added I could simply SELECT MAX(date_added), but that is just a column-level aggregate function. Is there any way I can resolve the entire row in a group?

Conceptually, if you imagine each group as a separate table, I want to SELECT * WHERE date_added=(SELECT MAX(date_added)) from that group table.

Best Answer

The traditional solution, the one you may find in books, is to do a self join: first find that "max date per group" of yours, then join to self table on rows with said max date.

However, some hacks allow you to avoid that. Consider the following query:

SELECT
  MAX(date_added) AS date_added,
  SUBSTRING_INDEX(GROUP_CONCAT(some_column ORDER BY date_added DESC), ',', 1) AS some_column,
  SUBSTRING_INDEX(GROUP_CONCAT(another_column ORDER BY date_added DESC), ',', 1) AS another_column
FROM t
GROUP BY whatever

GROUP_CONCAT is an aggregation function which implodes values onto one string. It allows for ORDER BY, which we utilize via ORDER BY date_added DESC so as to implode our desired value first. We then slice up the first token in the string via SUBSTRING_INDEX.

The downside here (apart from making the query quite the frightening appearance) is that your numerical values are transformed into texts. Typically no big deal with SQL, but please be aware.

See also my old post: Selecting a specific non aggregated column data in GROUP BY

There's another option where you do a semi-self-join, a much lighter one; you will have to give up usage of index. It's quite long to describe; it still uses GROUP_CONCAT and SUBSTRING_INDEX, but only for the purpose of creating a derived table with only relevant keys. This derived table is then joined with original table. See an example in SQL: selecting top N records per group, another solution.