Postgresql – Combining separate columns into smallest possible related ranges

gaps-and-islandspostgresqlpostgresql-9.6

I'm trying to discard multiple records that may or may not overlap based on the smallest possible contiguous ranges. I thought of doing something similar to This, however the ranges are numeric strings on separate column and i have on the same query 4 more fields where I only need to take the record with the smallest range

Data with simplified fields

    create table invoices(
    eventname varchar,
    /*...many fields*/
    quantity varchar,
    section varchar,
    rownumber varchar,
    secondrow varchar,
    lowseat varchar,
    highseat varchar,
    /*...some more fields*/
    status varchar,
    /*...even more fields*/
    created_at timestamp default now() not null,
    updated_at timestamp
);

INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '15', null, '9', '10', 'DEPLETED' ,  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '15', null, '7', '8', 'DEPLETED'  ,  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '14', null, '23', '24', 'DEPLETED',  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '1', '227', '13', null, '21', '21', 'DEPLETED',  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '8', '227', '14', null, '15', '22', 'DEPLETED',  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '1', '227', '14', null, '1', '1', 'DEPLETED',    '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', 'A57', 'GA', null, '1', '2', 'DEPLETED',    '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '3', 'A57', 'GA', null, '3', '5', 'DEPLETED',    '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 5 (Virginia vs. Clemson and Duke vs. North Carolina)', '3', '228', '14', null, '1', '3', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('Penn State Nittany Lions at Pittsburgh Panthers', '2', '227', 'K', null, '25', '26', 'DEPLETED', '2019-02-06 00:46:13.286828', null);

Visual representation:

Group 1

1 | =====
2 |   ===  --> take this record with all its values

Group 2

3 |    === --> take this record

Group 3

4 |       =======
5 |           ==  --> take this record
6 |         =====
  • Adjacent ranges should be merged.
  • Lower and upper bounds to be inclusive does fits best for seat numbers.

I did the following and it returns the same values for everything so i know its not right

SELECT distinct section, rownumber,
min(COALESCE(lowseat, '')) over 
(partition by grp) as lowseat,
max(maxhighseat) over (partition by grp) AS highseat
FROM  (
SELECT *, count(nextstart > maxhighseat OR NULL) OVER (PARTITION BY section,
rownumber ORDER BY lowseat desc, highseat desc NULLS LAST) AS grp
FROM  (
  SELECT section, rownumber, lowseat, highseat, max(COALESCE(highseat, '')) OVER (PARTITION BY section, rownumber ORDER BY lowseat, highseat) AS maxhighseat
       , lead(lowseat) OVER (PARTITION BY section, rownumber ORDER BY lowseat, highseat) As nextstart
  FROM invoices where status <> 'DEPLETED' and eventname like 'UCLA%'
  ) a
) b
ORDER  BY 1;

Table important fields look like:

 id | section | row | lowseat | highseat | created_at
----+---------------------------------------------------------------
  1 |      14 |  18 |       1 |       15 | 2019-01-01T00:00:00.000Z
  2 |      14 |  18 |       4 |       15 | 2019-01-01T00:00:00.000Z
  3 |      12 |  13 |       2 |       13 | 2019-02-01T00:00:00.000Z
  4 |      14 |  18 |       4 |       12 | 2019-01-01T00:00:00.000Z

Best Answer

This is a classical question. The question itself has still a number of gaps, no pun intended. Filling in with some ...

Assumptions

  • lowseat and highseat seem to be lower & upper bound of your ranges, obviously integer numbers, but stored as varchar. Change that, or you have to add type cast to my following query.

  • You did not define whether adjacent ranges should be merged or separate. Assuming separate, since those are not strictly "overlapping".

  • Assuming lower and upper bounds to be inclusive, fits best for seat numbers.

  • Ignoring query predicates that don't line up with the sample data.

Query

SELECT DISTINCT ON (island) *
FROM  (
   SELECT *
        , highseat - lowseat AS len -- off by 1, but irrelevant
        , count(gap) OVER (ORDER BY rn) AS island
   FROM  (
      SELECT *
           , (lowseat > max(highseat) OVER w) OR NULL AS gap
           , row_number() OVER w AS rn
      FROM   invoices
      WINDOW w AS (ORDER BY lowseat, highseat DESC  -- longest range 1st
                   ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
      ) sub1
   ) sub2
ORDER  BY island, len, lowseat;   -- break ties by picking smallest numbers

db<>fiddle here

This is based on lowseat and highseat, the rest of the row is just ballast.

Related answer with more explanation and an alternative procedural implementation:

About DISTINCT ON: