How to group data within a time span

gaps-and-islands

I have a group of rows sorted by date as below:

2017-01-01 
2017-01-01
2017-01-02
2017-01-03
2017-01-04
2017-01-17
2017-01-18
2017-01-18
2017-01-18
2017-01-19
2017-01-19
2017-01-26
2017-01-27
2017-01-27

DDL & DML for those willing to help:

CREATE TABLE foo ( DateCol date );
INSERT INTO foo ( DateCol )
VALUES
  ( '2017-01-01' ),
  ( '2017-01-01' ),
  ( '2017-01-02' ),
  ( '2017-01-03' ),
  ( '2017-01-04' ),
  ( '2017-01-17' ),
  ( '2017-01-18' ),
  ( '2017-01-18' ),
  ( '2017-01-18' ),
  ( '2017-01-19' ),
  ( '2017-01-19' ),
  ( '2017-01-26' ),
  ( '2017-01-27' ),
  ( '2017-01-27' )
);

I am trying to group the rows and add an additional column to mark the group. The criterion is that any two rows should be in one group if the time gap between them is less than 3 days, something like below:

2017-01-01  A
2017-01-01  A
2017-01-02  A
2017-01-03  A
2017-01-04  A
2017-01-17  B
2017-01-18  B
2017-01-18  B
2017-01-18  B
2017-01-19  B
2017-01-19  B
2017-01-26  C
2017-01-27  C
2017-01-27  C

How can this be achieved?

Best Answer

Window Function Method

SELECT date,
  chr(
    65 + CAST(count(reset) OVER (ORDER BY date) AS int)
  ) AS dategrp
FROM (
  SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset
  FROM foo
  ORDER BY date
) AS t

First we generate resets with a window function

SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset
FROM foo
ORDER BY date
    date    | reset 
------------+-------
 2017-01-01 |      
 2017-01-01 |      
 2017-01-02 |      
 2017-01-03 |      
 2017-01-04 |      
 2017-01-17 |     1
 2017-01-18 |      
 2017-01-18 |      
 2017-01-18 |      
 2017-01-19 |      
 2017-01-19 |      
 2017-01-26 |     1
 2017-01-27 |      
 2017-01-27 |      

Then we generate numeric groups

SELECT date, count(reset) OVER (ORDER BY date)
FROM (
  SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset
  FROM foo
  ORDER BY date
) AS t
    date    | count 
------------+-------
 2017-01-01 |     0
 2017-01-01 |     0
 2017-01-02 |     0
 2017-01-03 |     0
 2017-01-04 |     0
 2017-01-17 |     1
 2017-01-18 |     1
 2017-01-18 |     1
 2017-01-18 |     1
 2017-01-19 |     1
 2017-01-19 |     1
 2017-01-26 |     2
 2017-01-27 |     2
 2017-01-27 |     2

Then we use chr(65+) to get the alpha group names you want. Original query at the top..

    date    | dategrp 
------------+---------
 2017-01-01 | A
 2017-01-01 | A
 2017-01-02 | A
 2017-01-03 | A
 2017-01-04 | A
 2017-01-17 | B
 2017-01-18 | B
 2017-01-18 | B
 2017-01-18 | B
 2017-01-19 | B
 2017-01-19 | B
 2017-01-26 | C
 2017-01-27 | C
 2017-01-27 | C
(14 rows)