PostgreSQL GROUP BY Possible Sequences Explained

gaps-and-islandspostgresql

I have a table containing the following data, using Postgres 9.6:

log_id | sequence | made_at (timestamp)
206480    1            1
206480    1            2
206480    2            3
206480    3            4
206480    1            5
206480    2            6
206480    4            7
206480    5            8
206480    1            9
206480    2           10
206481    1           11
206481    2           12
206481    3           13
206481    4           14

I have to group and aggregate on the ID so I get an array of possible sequences. In the end I want the data to look like this:

log_id | sequence
206480  {1,1,2,3}
206480  {1,2,4,5}
206480  {1,2}
206481  {1,2,3,4}

I want a new row (with the sequences) when:

  • the log_id changes; or
  • the next sequence number is lower than the current sequence number.

There is another column which specifies the ordering (a timestamp), but it's in another table (I join them and use that timestamp). I left it out to make things easier, but we can assume the column is called made_at.

Best Answer

select      log_id
           ,array_agg (sequence)

from       (select      log_id 
                       ,sequence
                       ,count (is_restart) over
                        (
                            partition by    log_id 
                            order by        made_at
                        ) as restart_id

            from        (select      made_at
                                    ,log_id 
                                    ,sequence
                                    ,case 
                                         when sequence <
                                              lag (sequence) over
                                              (
                                                  partition by    log_id 
                                                  order by        made_at
                                              ) 
                                         then 1
                                     end            is_restart

                         from        logs
                         ) l
            ) l

group by    log_id      
           ,restart_id

order by    log_id      
           ,restart_id
;
+--------+-----------+
| log_id | array_agg |
+--------+-----------+
| 206480 | {1,1,2,3} |
+--------+-----------+
| 206480 | {1,2,4,5} |
+--------+-----------+
| 206480 | {1,2}     |
+--------+-----------+
| 206481 | {1,2,3,4} |
+--------+-----------+

Walkthrough

  • Identify restarts by comparing current sequence to previous sequence (LAG).

    select      made_at
               ,log_id 
               ,sequence
    
               ,case 
                    when sequence <
                         lag (sequence) over
                         (
                             partition by    log_id 
                             order by        made_at
                         ) 
                    then 1
                end            is_restart
    
    from        logs
    
    +---------+--------+----------+------------+
    | made_at | log_id | sequence | is_restart |
    +---------+--------+----------+------------+
    | 1       | 206480 | 1        |            |
    +---------+--------+----------+------------+
    | 2       | 206480 | 1        |            |
    +---------+--------+----------+------------+
    | 3       | 206480 | 2        |            |
    +---------+--------+----------+------------+
    | 4       | 206480 | 3        |            |
    +---------+--------+----------+------------+
    | 5       | 206480 | 1        | 1          |
    +---------+--------+----------+------------+
    | 6       | 206480 | 2        |            |
    +---------+--------+----------+------------+
    | 7       | 206480 | 4        |            |
    +---------+--------+----------+------------+
    | 8       | 206480 | 5        |            |
    +---------+--------+----------+------------+
    | 9       | 206480 | 1        | 1          |
    +---------+--------+----------+------------+
    | 10      | 206480 | 2        |            |
    +---------+--------+----------+------------+
    | 11      | 206481 | 1        |            |
    +---------+--------+----------+------------+
    | 12      | 206481 | 2        |            |
    +---------+--------+----------+------------+
    | 13      | 206481 | 3        |            |
    +---------+--------+----------+------------+
    | 14      | 206481 | 4        |            |
    +---------+--------+----------+------------+
    
  • Do "running counts" (similar to "Running totals") of restarts (is_restart).
    Rows that belongs to the same group will have the same count (AKA restart_id).
    The "Order by" in COUNT implies range between unbounded preceding and current row

    select      log_id 
               ,sequence
               ,count (is_restart) over
                (
                    partition by    log_id 
                    order by        made_at
                ) as group_id
    
    from        (...) l
    
    +--------+----------+----------+
    | log_id | sequence | group_id |
    +--------+----------+----------+
    | 206480 | 1        | 0        |
    +--------+----------+----------+
    | 206480 | 1        | 0        |
    +--------+----------+----------+
    | 206480 | 2        | 0        |
    +--------+----------+----------+
    | 206480 | 3        | 0        |
    +--------+----------+----------+
    | 206480 | 1        | 1        |
    +--------+----------+----------+
    | 206480 | 2        | 1        |
    +--------+----------+----------+
    | 206480 | 4        | 1        |
    +--------+----------+----------+
    | 206480 | 5        | 1        |
    +--------+----------+----------+
    | 206480 | 1        | 2        |
    +--------+----------+----------+
    | 206480 | 2        | 2        |
    +--------+----------+----------+
    | 206481 | 1        | 0        |
    +--------+----------+----------+
    | 206481 | 2        | 0        |
    +--------+----------+----------+
    | 206481 | 3        | 0        |
    +--------+----------+----------+
    | 206481 | 4        | 0        |
    +--------+----------+----------+
    
  • Group by log_id and restart_id and aggregate sequence

    select      log_id
               ,array_agg (sequence)
    
    from       (...) l
    
    group by    log_id      
               ,restart_id
    
    order by    log_id      
               ,restart_id
    ;
    
    +--------+-----------+
    | log_id | array_agg |
    +--------+-----------+
    | 206480 | {1,1,2,3} |
    +--------+-----------+
    | 206480 | {1,2,4,5} |
    +--------+-----------+
    | 206480 | {1,2}     |
    +--------+-----------+
    | 206481 | {1,2,3,4} |
    +--------+-----------+