MySQL – SELECT LIMIT 1 Per Column Value

greatest-n-per-groupMySQL

Lets say I have the following table

-----------------------------
| user_id   | comment       |
-----------------------------
| 2         | thats cool    |
| 2         | awesome       |
| 3         | i hate this   |
| 3         | okay          |
| 6         | this is weird |
| 6         | hello?        |
| 6         | what is it    |
| 9         | how are you   |
| 16        | too slow      |
| 16        | yes           |
| 17        | alrighty      |
-----------------------------

How can you select one row per user_id? So my results would be:

-----------------------------
| user_id   | comment       |
-----------------------------
| 2         | thats cool    |
| 3         | i hate this   |
| 6         | this is weird |
| 9         | how are you   |
| 16        | too slow      |
| 17        | alrighty      |
-----------------------------

Is this possible with a single efficient query? Or are sub-selected necessary? Is it possible to somehow use DISTINCT on a single column?

Best Answer

That's what GROUP BY is used for. Get one row (per group). In this case, it will show all distinct user_id values and for the rest of the columns, you can (have to) use aggregate functions like MIN(), MAX(), AVG(), SUM() as you will have more than one values per group and only one can be shown.

SELECT
    user_id
  , MIN(comment) AS comment  -- it will show the first in alphabetical order  
                             -- you could also use MAX()
FROM
    tableX
GROUP BY
    user_id ;

MySQL allows also the following unorthodox solution, that will return one (more or less random) comment per user:

SELECT
    user_id
  , comment
FROM
    tableX
GROUP BY
    user_id ;

This last query will not work but raise an error if the (stricter) ONLY_FULL_GROUP_BY mode is enabled. In the recently released 5.7 version, this mode is the default and a new function, ANY_VALUE(), is provided. For more details, see the MySQL Handling of GROUP BY page. The query can be written now:

SELECT
    user_id
  , ANY_VALUE(comment) AS comment
FROM
    tableX
GROUP BY
    user_id ;

Note that with either the "unorthodox" version or using the recent ANY_VALUE() function, if we add more columns in the SELECT list, their values is not guaranteed to be from the same row, just from a row in the same group. The way they are selected is not exactly random, depends on the execution plan and the indexes used.