I'm trying to discard multiple records that may or may not overlap based on the smallest possible contiguous ranges. I thought of doing something similar to This, however the ranges are numeric strings on separate column and i have on the same query 4 more fields where I only need to take the record with the smallest range
Data with simplified fields
create table invoices(
eventname varchar,
/*...many fields*/
quantity varchar,
section varchar,
rownumber varchar,
secondrow varchar,
lowseat varchar,
highseat varchar,
/*...some more fields*/
status varchar,
/*...even more fields*/
created_at timestamp default now() not null,
updated_at timestamp
);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '15', null, '9', '10', 'DEPLETED' , '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '15', null, '7', '8', 'DEPLETED' , '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '14', null, '23', '24', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '1', '227', '13', null, '21', '21', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '8', '227', '14', null, '15', '22', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '1', '227', '14', null, '1', '1', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', 'A57', 'GA', null, '1', '2', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '3', 'A57', 'GA', null, '3', '5', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 5 (Virginia vs. Clemson and Duke vs. North Carolina)', '3', '228', '14', null, '1', '3', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('Penn State Nittany Lions at Pittsburgh Panthers', '2', '227', 'K', null, '25', '26', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
Visual representation:
Group 1
1 | =====
2 | === --> take this record with all its values
Group 2
3 | === --> take this record
Group 3
4 | =======
5 | == --> take this record
6 | =====
- Adjacent ranges should be merged.
- Lower and upper bounds to be inclusive does fits best for seat numbers.
I did the following and it returns the same values for everything so i know its not right
SELECT distinct section, rownumber,
min(COALESCE(lowseat, '')) over
(partition by grp) as lowseat,
max(maxhighseat) over (partition by grp) AS highseat
FROM (
SELECT *, count(nextstart > maxhighseat OR NULL) OVER (PARTITION BY section,
rownumber ORDER BY lowseat desc, highseat desc NULLS LAST) AS grp
FROM (
SELECT section, rownumber, lowseat, highseat, max(COALESCE(highseat, '')) OVER (PARTITION BY section, rownumber ORDER BY lowseat, highseat) AS maxhighseat
, lead(lowseat) OVER (PARTITION BY section, rownumber ORDER BY lowseat, highseat) As nextstart
FROM invoices where status <> 'DEPLETED' and eventname like 'UCLA%'
) a
) b
ORDER BY 1;
Table important fields look like:
id | section | row | lowseat | highseat | created_at
----+---------------------------------------------------------------
1 | 14 | 18 | 1 | 15 | 2019-01-01T00:00:00.000Z
2 | 14 | 18 | 4 | 15 | 2019-01-01T00:00:00.000Z
3 | 12 | 13 | 2 | 13 | 2019-02-01T00:00:00.000Z
4 | 14 | 18 | 4 | 12 | 2019-01-01T00:00:00.000Z
Best Answer
This is a classical gaps-and-island question. The question itself has still a number of gaps, no pun intended. Filling in with some ...
Assumptions
lowseat
andhighseat
seem to be lower & upper bound of your ranges, obviouslyinteger
numbers, but stored asvarchar
. Change that, or you have to add type cast to my following query.You did not define whether adjacent ranges should be merged or separate. Assuming separate, since those are not strictly "overlapping".
Assuming lower and upper bounds to be inclusive, fits best for seat numbers.
Ignoring query predicates that don't line up with the sample data.
Query
db<>fiddle here
This is based on
lowseat
andhighseat
, the rest of the row is just ballast.Related answer with more explanation and an alternative procedural implementation:
About
DISTINCT ON
: