Looks like simple math. Assuming that all the ranges are of the same inclusive-exclusive type [)
:
SELECT some_data, int4range(lower(a), LEAST(upper(a),lower(b))) AS ab
FROM test
WHERE lower(a) < lower(b)
UNION ALL
SELECT some_data, int4range(GREATEST(lower(a),upper(b)), upper(a))
FROM test
WHERE upper(b) < upper(a)
UNION ALL
SELECT some_data, a
FROM test
WHERE b = int4range(0,0)
OR a = int4range(0,0) ;
Tested at SQL-Fiddle.
A couple of suggestions:
Use legal, lower-case, unquoted identifiers to save yourself a lot of confusion. end
(and, to a lesser degree start
) are reserved words.
Your id
column seems to be numeric type like integer
, not text
.
Since you don't need a time component in start_day
and end_day
, the appropriate data type is date
, not timestamp
.
Why concatenate id
and the year for a new "ID"? Add a second column year
if you need it for a PK. Or don't add an additional column at all. It can cheaply be extracted from the new start_day
on the fly. Generally, don't store data redundantly if you can avoid it.
Typically, timestamp
ranges use an inclusive lower bound and an exclusive upper bound. Since timestamps can have fractional digits (up to 6 in Postgres) that is much cleaner. Your input 2003-12-31 23:59:59
would fail for 2003-12-31 23:59:59.123
.
So, your table could look like this:
CREATE TABLE ad (
id int PRIMARY KEY
, start_day date NOT NULL -- *inclusive* lower bound
, end_day date NOT NULL -- *exclusive* upper bound
CHECK (end_day > start_day) -- enforce legal input
);
Proper test values:
INSERT INTO ad(id, start_day, end_day)
VALUES
(1, '2003-06-07', '2004-10-01') -- span 2 years (your example)
, (2, '2003-06-07', '2003-06-08') -- 1 day in same year
, (3, '2003-06-07', '2003-10-01') -- span 1 year
, (4, '2003-06-07', '2006-10-01'); -- span many years
Solution
Use generate_series()
in a LATERAL
join, based on start and end day, truncated to the year with date_trunc()
. This produces one row per year with the new start date. Add a year and you have the new end date. Except for first and last row per id
, where you substitute the proper start / end with GREATEST
and LEAST
respectively. Voilá.
-- CREATE TABLE ad_year AS
SELECT ad.id
, extract('year' FROM y)::int AS year
, GREATEST(y , ad.start_day) AS start_day
, LEAST (y + interval '1 year', ad.end_day) AS end_day
FROM ad
, generate_series(date_trunc('year', start_day::timestamp) -- cast to ts here!
, date_trunc('year', end_day::timestamp)
, interval '1 year') y;
Note that date_trunc()
returns timestamptz
for timestamptz
input and timestamp
for timestamp
input. For date
input it defaults to timestamptz
. Since you seem to be ignoring time zones, cast the date
to timestamp
explicitly (start_day::timestamp
).
Result:
id | year | start_day | end_day
----+------+---------------------+---------------------
1 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
1 | 2004 | 2004-01-01 00:00:00 | 2004-10-01 00:00:00
2 | 2003 | 2003-06-07 00:00:00 | 2003-06-08 00:00:00
3 | 2003 | 2003-06-07 00:00:00 | 2003-10-01 00:00:00
4 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
4 | 2004 | 2004-01-01 00:00:00 | 2005-01-01 00:00:00
4 | 2005 | 2005-01-01 00:00:00 | 2006-01-01 00:00:00
4 | 2006 | 2006-01-01 00:00:00 | 2006-10-01 00:00:00
If you create a new table from the result, I suggest (id, year)
as primary key.
Aside: This is not an operator: ||'_'||
(nor a cute little face). It's 2 concatenation operators ||
and a string literal '_'
.
Best Answer
A domain is a constraint. It doesn't hold any values whatsoever. It can only restrict a type to exclude values. If you need to store a range, use the range types.
As a side note, you can in fact, create a domain on a range type. In the event you want all of your ranges to start on a Wednesday
Moreover, you can also ensure the range is always one year:
But if you're going to do that -- make it such that your range is always one year and a fixed size, I suggest you just store the year that you start as
::smallint
and be done with it.