Mysql – How to write this query

greatest-n-per-groupMySQL

I have a table that looks like this:

CREATE TABLE `message` (

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `user_id` int(10) unsigned NOT NULL,

  `grp_id` int(10) unsigned NOT NULL,

  `body` varchar(100) COLLATE utf8_unicode_ci NOT NULL,

  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  PRIMARY KEY (`created_at`,`grp_id`,`user_id`,`id`),

  KEY `message_user_id_foreign` (`user_id`),

  KEY `message_grp_id_foreign` (`grp_id`),

  KEY `mssg_id_is` (`id`),

  CONSTRAINT `message_grp_id_foreign` FOREIGN KEY (`grp_id`) REFERENCES `groups` (`id`) ON UPDATE CASCADE,

  CONSTRAINT `message_user_id_foreign` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE CASCADE

) ENGINE=InnoDB AUTO_INCREMENT=50001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

I want to fetch the latest 50 MESSAGES for every group that the user is in. This list will be supplied in the query . (something like 8,4,2,1,6,…)

What will the mysql query look like?
The number of groups a user is in is variable.

Best Answer

This is best done by LATERAL JOINs or if it is not available window functions. MySQL unfortunately doesn't support either but the latter is relatively easy to be mimicked.

> set @rownum=0;
> set @grp_lag=null; 
> select id, case when @grp_lag != grp_id then @rownum:=1 else @rownum:=@rownum+1 end as rownum, @grp_lag:=grp_id  
from message 
order by grp_id, created_at desc;

+-------+--------+------------------+
| id    | rownum | @grp_lag:=grp_id |
+-------+--------+------------------+
| 50008 |      1 |                1 |
| 50007 |      2 |                1 |
| 50001 |      3 |                1 |
| 50004 |      4 |                1 |
| 50002 |      1 |                2 |
| 50005 |      2 |                2 |
| 50003 |      1 |                3 |
| 50006 |      1 |                6 |
+-------+--------+------------------+
8 rows in set (0.00 sec)

After having this you can have the final query something like this:

-- Set initial values for mimicing the window function
set @rownum=0;
set @grp_lag=null; 

-- Query every message where rownum < per group limit
select * from (
    -- subquery providing rownumber for every group order by the created_at column
    select *, case when @grp_lag != grp_id then @rownum:=1 else @rownum:=@rownum+1 end as rownum, @grp_lag:=grp_id 
    from message 
    where grp_id in (1,2)
    -- Futher filtering for optimisation if possible
    order by grp_id, created_at desc
) numbered
where rownum < 50;

You can wrap this into a procedure to execute it easier. Please keep in mind, set calls also return and they need to be in the same session (yet another reason for a stored procedure).

> set @rownum=0;
Query OK, 0 rows affected (0.00 sec)

The query above will fetch every row where grp_id matches the in condition and generate the row number for each per group. So in case of large number of messages further optimisations are necessary. For example you can heuristically filter by created_at > [estimated date for having at least 50 message].