MySQL – Get Count of Consecutive Dates

group bymariadbMySQL

I want to get a list of dates where there are consecutive dates and it's corresponding count.

For example, if the I have the following data set

Date
2021-07-28
2021-07-27
2021-07-26
2021-07-25
2021-07-24
2021-07-23
2021-07-22
2021-07-21
2021-07-18
2021-07-17
2021-07-14
2021-07-11
2021-07-09
2021-07-06
2021-07-04
2021-07-03
2021-07-02

The result I would like is where consecutive date count > x

2021-07-21  8
2021-07-17  2
2021-07-02  3

I'm not really sure how to approach this problem. If an explanation could be provided with the query that would be great, although not required.

Best Answer

As correctly noted by Charlieface, this is a Gaps and Islands problem. Another way of solving this specific variation – also involving a window function, though a different one this time – would go like this:

WITH
  partitioned AS
  (
    SELECT
      *
    , DATEDIFF(Date, '1970-01-01') - ROW_NUMBER() OVER (ORDER BY Date ASC) AS PartID
    FROM
      YourTable
  )
SELECT
  MIN(Date) AS StartDate
, COUNT(*)  AS DayCount
FROM
  partitioned
GROUP BY
  PartID
HAVING
  COUNT(*) > 1
ORDER BY
  PartID
;

This solution relies on the fact that the difference between a representation of a date as an integer (DATEDIFF(...)) and the date's numerical position in an ordered sequence (ROW_NUMBER() OVER ...) is a constant value. If we looked at the intermediate values returned by the functions in the PartID expression, we would find the following:

Date DATEDIFF(Date, '1970-01-01') ROW_NUMBER() OVER (ORDER BY Date ASC) PartID
2021-07-02 18810 1 18809
2021-07-03 18811 2 18809
2021-07-04 18812 3 18809
2021-07-06 18814 4 18810
2021-07-09 18817 5 18812
2021-07-11 18819 6 18813
2021-07-14 18822 7 18815
2021-07-17 18825 8 18817
2021-07-18 18826 9 18817
2021-07-21 18829 10 18819
2021-07-22 18830 11 18819
2021-07-23 18831 12 18819
2021-07-24 18832 13 18819
2021-07-25 18833 14 18819
2021-07-26 18834 15 18819
2021-07-27 18835 16 18819
2021-07-28 18836 17 18819

As you can see, the difference between DATEDIFF and ROW_NUMBER (represented by the column PartID) is the same where dates are consecutive, and it is different for different sequences, which makes it a perfect candidate for a GROUP BY criterion. And that is exactly what the query is using it for. By the way, the date 1970-01-01 has no specific meaning in this case. Any date could be used instead of it as long as it is a constant value.

Another important note to make – and it makes this answer substantially different from Charlieface's suggestion – is that all the dates must be unique for the method to work as expected.

A live demo of this solution can be found at db<>fiddle.