Postgresql – Selecting row grouped by an “add-on” indicator

gaps-and-islandspostgresqlpostgresql-9.6

I have a (pre-existing, no I did not design it like this!) schema / dataset like this:

CREATE TABLE t(id,name,seq,addon,kind)
AS VALUES
    ( 1234::int, 'A' , 345::int, 'f'::bool, 'c'  ),
    ( 1235     , NULL, 346     , 't'      , NULL ),
    ( 1236     , NULL, 347     , 't'      , NULL ),
    ( 2345     , 'B' , 348     , 'f'      , 'x'  ),
    ( 2346     , NULL, 349     , 't'      , NULL ),
    ( 3456     , 'C' , 350     , 'f'      , 'c'  ),
    ( 3457     , NULL, 351     , 't'      , NULL );

Where:

The "id" field is just a SQL sequence (order not guaranteed)
The "seq" field is guaranteed to be sequential throughout the table, e.g. table is normally sorted on this column.
The "addon" field is indicating a group of related entries, e.g. TRUE entries are related to the nearest FALSE entry with a lower "seq".

I want to SELECT all the rows with "kind" = 'x' and all of their associated "addon" rows, without getting any other "addon" rows for other kinds.

So for instance:

SELECT * FROM t WHERE kind = 'x' AND ???

Would give me:

  id  | name | seq  |addon | kind |
------+------+------+------+------+...
 2345 | "B"  | 348  |  f   | "x"  |
 2346 |      | 349  |  t   |      |

I have PostgreSQL 9.6…

Best Answer

You just need to create a real grouping. Here we use a Window Function count(x) which counts all things NOT NULL. Try it with this simplified example,

SELECT count(a) OVER ()
FROM ( VALUES (true),(false) ) AS t(a);

In order to count just what we want, we use OR which evaluates the rhs if the lhs is not true. So here we test if the row IS FALSE (returning true if the value is false), if the condition does not evaluate to true (meaning the value is true or null) we return null (so count(x) skips it). here is the full query,

SELECT *,
  count((addon IS FALSE) OR NULL) OVER (ORDER BY seq) AS grp
FROM t

  id  | name | seq | addon | kind | grp 
------+------+-----+-------+------+-----
 1234 | A    | 345 | f     | c    |   1
 1235 |      | 346 | t     |      |   1
 1236 |      | 347 | t     |      |   1
 2345 | B    | 348 | f     | x    |   2
 2346 |      | 349 | t     |      |   2
 3456 | C    | 350 | f     | c    |   3
 3457 |      | 351 | t     |      |   3
(7 rows)

now you can easily do anything you want..

WITH t AS (
  SELECT *,
    count((addon IS FALSE) OR NULL) OVER (ORDER BY seq) AS grp
  FROM t
)
SELECT *
FROM t AS t1
WHERE EXISTS (
  SELECT 1
  FROM t AS t2
  WHERE addon IS FALSE
    AND kind = 'x'
    AND t1.grp = t2.grp
)
ORDER BY seq;

To see this pattern elsewhere, check out the gaps-and-islands and postgresql, using this search

Related Solutions

MySQL – Finding Contiguous Ranges in Grouped Data

The answer with the variables is going to be more efficient but here is an answer with pure SQL:

select 
    a.id_user, 
    a.id_ringtype, 
    a.number      as min,
    min(b.number) as max
from 
    rings as a 
  join rings as b 
    on  a.id_user = b.id_user 
    and a.id_ringtype = b.id_ringtype 
    and a.number <= b.number 
where not exists 
      ( select 1 
        from rings as c 
        where c.id_user = a.id_user 
          and c.id_ringtype = a.id_ringtype 
          and c.number = a.number - 1
      )
  and not exists 
      ( select 1 
        from rings as d 
        where d.id_user = b.id_user 
          and d.id_ringtype = b.id_ringtype 
          and d.number = b.number + 1
      ) 
group by 
    a.id_user, 
    a.id_ringtype, 
    a.number ;

Efficiency will depend on many factors (mainly distibution of data) but an index on (id_user, id_ringtype, number) is essential for this query.

Postgresql – Selecting latest row for user grouped by day help

One way (not necessarily the most efficient) is to use the Postgres special DISTINCT ON modifier in a subquery, and then sum:

SELECT
    SUM(score) AS score, 
    COUNT(*) AS count,
    created_at_date
FROM
  ( SELECT DISTINCT ON (user_id, created_at::date) 
        created_at::date AS created_at_date, 
        (DATA ->> 'score')::integer AS score
    FROM 
        scores
    ORDER BY 
        user_id, created_at::date, created_at DESC
  ) AS g
GROUP BY 
    created_at_date ;

Tested at SQLfiddle.

Best Answer

Related Solutions

MySQL – Finding Contiguous Ranges in Grouped Data

Postgresql – Selecting latest row for user grouped by day help

Related Question