Postgresql – return continuous set of IDs based on two other columns

gaps-and-islandspostgresql

Given a table such as the one below, how can I ask PostgreSQL to return the first set of n rows in which one column has a certain value (code=0) and the values of the second column (named segment) are all the same, and the sequence of IDs is not broken.

For n=4, the result should be id's 12, 13, 14, 15 (i.e. the first four id's where code=0 and the values of segment are the same – in this case 2).

CREATE TABLE foo
AS
  SELECT *
  FROM ( VALUES
    (  1, 0, 0 ),
    (  2, 0, 0 ),
    (  3, 0, 0 ),
    (  4, 0, 1 ),
    (  5, 0, 1 ),
    (  6, 0, 1 ),
    (  7, 1, 1 ),
    (  8, 0, 1 ),
    (  9, 1, 2 ),
    ( 10, 0, 2 ),   
    ( 11, 1, 2 ),
    ( 12, 0, 2 ), -- HERE
    ( 13, 0, 2 ), -- HERE
    ( 14, 0, 2 ), -- HERE
    ( 15, 0, 2 ), -- HERE
    ( 16, 0, 3 )
  ) AS t(id,code,segment);

Best Answer

You can do something like this..

SELECT
  array_agg(id),
  code,
  segment,
  grp,
  count(*)
FROM (
  SELECT
    id,
    code,
    segment,
    count(is_reset) OVER (ORDER BY id) AS grp
  FROM (
    SELECT
      id,
      code,
      segment,
      CASE
        WHEN
          code <> lag(code) OVER (ORDER BY id)
          OR segment <> lag(segment) OVER (ORDER BY id)
        THEN 1
      END AS is_reset
    FROM foo
  ) AS t
) AS g
GROUP BY code, segment, grp;

   array_agg   | code | segment | grp | count 
---------------+------+---------+-----+-------
 {11}          |    1 |       2 |   6 |     1
 {10}          |    0 |       2 |   5 |     1
 {1,2,3}       |    0 |       0 |   0 |     3
 {7}           |    1 |       1 |   2 |     1
 {4,5,6}       |    0 |       1 |   1 |     3
 {16}          |    0 |       3 |   8 |     1
 {8}           |    0 |       1 |   3 |     1
 {12,13,14,15} |    0 |       2 |   7 |     4
 {9}           |    1 |       2 |   4 |     1
(9 rows)

To query this, just search for code = 0 AND count >= 4.

Explanation

First we generate a reset

SELECT
  id,
  code,
  segment,
  CASE
    WHEN
      code <> lag(code) OVER (ORDER BY id)
      OR segment <> lag(segment) OVER (ORDER BY id)
    THEN 1
  END AS is_reset
FROM foo

 id | code | segment | is_reset 
----+------+---------+----------
  1 |    0 |       0 |         
  2 |    0 |       0 |         
  3 |    0 |       0 |         
  4 |    0 |       1 |        1
  5 |    0 |       1 |         
  6 |    0 |       1 |         
  7 |    1 |       1 |        1
  8 |    0 |       1 |        1
  9 |    1 |       2 |        1
 10 |    0 |       2 |        1
 11 |    1 |       2 |        1
 12 |    0 |       2 |        1
 13 |    0 |       2 |         
 14 |    0 |       2 |         
 15 |    0 |       2 |         
 16 |    0 |       3 |        1
(16 rows)

Then we create groups,

  SELECT
    id,
    code,
    segment,
    count(is_reset) OVER (ORDER BY id) AS grp
  FROM (
    SELECT
      id,
      code,
      segment,
      CASE
        WHEN
          code <> lag(code) OVER (ORDER BY id)
          OR segment <> lag(segment) OVER (ORDER BY id)
        THEN 1
      END AS is_reset
    FROM foo
  ) AS t;
 id | code | segment | grp 
----+------+---------+-----
  1 |    0 |       0 |   0
  2 |    0 |       0 |   0
  3 |    0 |       0 |   0
  4 |    0 |       1 |   1
  5 |    0 |       1 |   1
  6 |    0 |       1 |   1
  7 |    1 |       1 |   2
  8 |    0 |       1 |   3
  9 |    1 |       2 |   4
 10 |    0 |       2 |   5
 11 |    1 |       2 |   6
 12 |    0 |       2 |   7
 13 |    0 |       2 |   7
 14 |    0 |       2 |   7
 15 |    0 |       2 |   7
 16 |    0 |       3 |   8
(16 rows)

Now you'll notice that the groups a fn(code,segment), reseting when either changes. Now we group by the grp, and do a count(*), but because you want the ids we run them through array_agg and store them as an array.

Note, you say and the sequence of IDs is not broken. I'm working off the assumption that this is not what you meant. PostgreSQL doesn't have a concept of gapless sequences, so that's not a good idea anyway. I assume you wanted what I've done.