MySQL – Get the Most Recent 2 Entries for Each Group by Timestamp

greatest-n-per-groupMySQLmysql-5.7

I am using MySQL 5.7 database with the following columns

  • item_symbol
  • source_date
  • price

I have composite primary keys which are a combination of item_symbol and source_date. I have around 100k entries in the table.

My question here is, for each item_symbol, I would like to select item_symbol and price for the most recent (based on source date) 2 entries from the table.

An example of my table:

item_symbol price source_date
A 20.1 2021-06-10
A 18.2 2021-06-11
A 10.9 2021-06-13
A 21.0 2021-06-15
B 88.2 2021-06-10
B 60.9 2021-06-11
B 78.16 2021-06-13
B 79.0 2021-06-15

Best Answer

I went to my favourite MySQL "tips and tricks" site here and went to the common queries link and looked for the Top N per group section. The great thing about this site is that it tells you how to do stuff in MySQL for all versions - well, going back at least to MySQL 5.5 - and if you're still running that, well...

I came up with the following adapted from above (all of the DDL, DML and SQL below is available on the fiddle here):

I used the DDL and DML from @nbk, kudos to him (and +1):

CREATE TABLE item
(
  item_symbol CHAR(1), 
  price       DECIMAL(10,2), 
  source_date DATE
);

and populate it:

INSERT INTO item
(item_symbol, price, source_date)
VALUES
('A', 20.1, '2021-06-10'),
('A', 18.2, '2021-06-11'),
('A', 10.9, '2021-06-13'),
('A', 21.0, '2021-06-15'),
('B', 88.2, '2021-06-10'),
('B', 60.9, '2021-06-11'),
('B', 78.16, '2021-06-13'),
('B', 79.0, '2021-06-15');

MySQL allows the use of user variables which are a godsend when you don't have capabilities such as the ROW_NUMBER() window function which would have made this query trivial. I would strongly urge you to upgrade to version 8, it has many other goodies - CTEs, CHECK constraints...

Anyway, I'll demonstrate the steps, partly to explain them to you, and partly to explain them to myself! :-)

SELECT
  item_symbol, price, source_date,
  IF 
  (
    @prev <> item_symbol,
    @row_num := 1,
    @row_num := @row_num + 1
  ) AS my_rank,
  @prev := item_symbol
FROM item
JOIN 
  (
    SELECT @row_num := NULL, @prev := 0
  ) AS r
ORDER BY item_symbol, source_date DESC, price DESC;

Result:

item_symbol     price   source_date     my_rank     @prev := item_symbol
          A     21.00   2021-06-15            1     A
          A     10.90   2021-06-13            2     A
          A     18.20   2021-06-11            3     A
          A     20.10   2021-06-10            4     A
          B     79.00   2021-06-15            1     B
          B     78.16   2021-06-13            2     B
          B     60.90   2021-06-11            3     B
          B     88.20   2021-06-10            4     B
8 rows

So, we have the items ('A', 'B') order by date DESC (most recent first) with the price. Note that 12 lines of that query could be replaced by one ROW_NUMBER() function line!

So, now we wrap that in a query, pulling out those results whose my_rank value is <= 2 - which gives us the two most recent dates!

SELECT item_symbol, price, source_date, my_rank  -- this last one is not 
FROM                                             -- required - for clarity...
(
  SELECT
    item_symbol, price, source_date,
    IF 
    (
      @prev <> item_symbol,
      @row_num := 1,
      @row_num := @row_num + 1
    ) AS my_rank,
    @prev := item_symbol
    FROM item
    JOIN (SELECT @row_num := NULL, @prev := 0) AS r
    ORDER BY item_symbol, source_date DESC, price DESC  -- in case of ties!
) AS t
WHERE t.my_rank <= 2
ORDER BY item_symbol, source_date DESC; -- change this as required

Result:

item_symbol     price   source_date     my_rank
          A     21.00   2021-06-15            1
          A     10.90   2021-06-13            2
          B     79.00   2021-06-15            1
          B     78.16   2021-06-13            2

I would suggest that you spend some time browsing the artful softare site!