PostgreSQL – How to Assign Group IDs Depending on Content

gaps-and-islandspostgresqlwindow functions

I have the following table with data as shown below; linenrs are monotonically increasing but not necessarily consecutive; when the key field contains an ellipsis ... that indicates an entry that is continued from above:

create table source (
  linenr    integer unique not null,
  key       text not null,
  value     text );

insert into source values
  (  2, 'tags',  'a'          ),
  (  3, '...',   'b'          ),
  (  4, 'title', 'The Title'  ),
  (  5, 'note',  'this is'    ),
  (  6, '...',   'an EXAMPLE' ),
  (  8, 'title', 'over'       ),
  (  9, '...',   'three'      ),
  ( 10, '...',   'lines'      ),
  ( 11, 'about', 'grouping'   );

Now I want to have a view where a group number is assigned based on the content of the key field; group numbers do not have to be consecutive but should be distinct for each group of rows that starts with a key other than ... and continues thru all lines where key is ..., like this:

╔════════╤═══════╤═══════╤════════════╗
║ linenr │ group │  key  │   value    ║
╠════════╪═══════╪═══════╪════════════╣
║      2 │     1 │ tags  │ a          ║
║      3 │     1 │ ...   │ b          ║
║      4 │     2 │ title │ The Title  ║
║      5 │     3 │ note  │ this is    ║
║      6 │     3 │ ...   │ an EXAMPLE ║
║      8 │     4 │ title │ over       ║
║      9 │     4 │ ...   │ three      ║
║     10 │     4 │ ...   │ lines      ║
║     11 │     5 │ about │ grouping   ║
╚════════╧═══════╧═══════╧════════════╝

I tried to do this with windows / partitions and the tabibitosan pattern, but haven't been able to come up with anything working; also, using at the preceding row with lag() doesn't help where there are multiple consecutive lines with .... In a spreadsheet this is an extremely easy thing to do but in SQL I can seemingly not refer to the previous row of the current query, can I?

Best Answer

create table source (
  linenr    integer unique not null,
  key       text not null,
  value     text );

insert into source values
  (  2, 'tags',  'a'          ),
  (  3, '...',   'b'          ),
  (  4, 'title', 'The Title'  ),
  (  5, 'note',  'this is'    ),
  (  6, '...',   'an EXAMPLE' ),
  (  8, 'title', 'over'       ),
  (  9, '...',   'three'      ),
  ( 10, '...',   'lines'      ),
  ( 11, 'about', 'grouping'   );

You can get it using a simple CASE.

select linenr, key, value,
       case when key <> '...' then 1 end as rst
from   source;
linenr | key   | value      |  rst
-----: | :---- | :--------- | ---:
     2 | tags  | a          |    1
     3 | ...   | b          | null
     4 | title | The Title  |    1
     5 | note  | this is    |    1
     6 | ...   | an EXAMPLE | null
     8 | title | over       |    1
     9 | ...   | three      | null
    10 | ...   | lines      | null
    11 | about | grouping   |    1

Now SUM(rst) ignores null values and do the trick.

select linenr, key, value,
       sum(rst) over (order by linenr) grp
from (
      select linenr, key, value,
             case when key <> '...' then 1 end as rst
      from   source
     ) x;
linenr | key   | value      | grp
-----: | :---- | :--------- | --:
     2 | tags  | a          |   1
     3 | ...   | b          |   1
     4 | title | The Title  |   2
     5 | note  | this is    |   3
     6 | ...   | an EXAMPLE |   3
     8 | title | over       |   4
     9 | ...   | three      |   4
    10 | ...   | lines      |   4
    11 | about | grouping   |   5

dbfiddle here