PostgreSQL – Creating DOMAIN on a Range of Years

domainpostgresql

I have to create a custom domain that is an interval between two years, but i actually need both years.
For example:

'2013/2014'
'2015/2016'

Is there a way to pack both years inside a single domain, or have i to separate them in two attributes of the relation?

Best Answer

A domain is a constraint. It doesn't hold any values whatsoever. It can only restrict a type to exclude values. If you need to store a range, use the range types.

As a side note, you can in fact, create a domain on a range type. In the event you want all of your ranges to start on a Wednesday

CREATE DOMAIN foo AS daterange
  CHECK ( EXTRACT(dow FROM lower(VALUE)) = 4 );

SELECT daterange('2018-01-02'::date,'2018-02-05'::date);

SELECT daterange('2018-01-02'::date,'2018-02-05'::date)::foo;
ERROR:  value for domain foo violates check constraint "foo_check"

Moreover, you can also ensure the range is always one year:

CREATE DOMAIN bar AS daterange
  CHECK (
    (lower(VALUE)::timestamp without time zone - upper(VALUE)::timestamp without time zone) = '-365 days'
  );

But if you're going to do that -- make it such that your range is always one year and a fixed size, I suggest you just store the year that you start as ::smallint and be done with it.

Related Solutions

PostgreSQL – Multi-Valued Range Subtraction

Looks like simple math. Assuming that all the ranges are of the same inclusive-exclusive type [):

SELECT some_data, int4range(lower(a), LEAST(upper(a),lower(b))) AS ab
FROM test
WHERE lower(a) < lower(b) 

UNION ALL

SELECT some_data, int4range(GREATEST(lower(a),upper(b)), upper(a)) 
FROM test
WHERE upper(b) < upper(a) 

UNION ALL

SELECT some_data, a
FROM test
WHERE b = int4range(0,0)
   OR a = int4range(0,0) ;

Tested at SQL-Fiddle.

PostgreSQL – How to Split Up Interval into Year Slices

A couple of suggestions:

Use legal, lower-case, unquoted identifiers to save yourself a lot of confusion. end (and, to a lesser degree start) are reserved words.
Your id column seems to be numeric type like integer, not text.
Since you don't need a time component in start_day and end_day, the appropriate data type is date, not timestamp.
Why concatenate id and the year for a new "ID"? Add a second column year if you need it for a PK. Or don't add an additional column at all. It can cheaply be extracted from the new start_day on the fly. Generally, don't store data redundantly if you can avoid it.
Typically, timestamp ranges use an inclusive lower bound and an exclusive upper bound. Since timestamps can have fractional digits (up to 6 in Postgres) that is much cleaner. Your input 2003-12-31 23:59:59 would fail for 2003-12-31 23:59:59.123.

So, your table could look like this:

CREATE TABLE ad (
  id        int PRIMARY KEY
, start_day date NOT NULL    -- *inclusive* lower bound
, end_day   date NOT NULL    -- *exclusive* upper bound
CHECK (end_day > start_day)  -- enforce legal input
);

Proper test values:

INSERT INTO ad(id, start_day, end_day)
VALUES
  (1, '2003-06-07', '2004-10-01')   -- span 2 years (your example)
, (2, '2003-06-07', '2003-06-08')   -- 1 day in same year
, (3, '2003-06-07', '2003-10-01')   -- span 1 year
, (4, '2003-06-07', '2006-10-01');  -- span many years

Solution

Use generate_series() in a LATERAL join, based on start and end day, truncated to the year with date_trunc(). This produces one row per year with the new start date. Add a year and you have the new end date. Except for first and last row per id, where you substitute the proper start / end with GREATEST and LEAST respectively. Voilá.

-- CREATE TABLE ad_year AS
SELECT ad.id
     , extract('year' FROM y)::int AS year
     , GREATEST(y                    , ad.start_day) AS start_day
     , LEAST   (y + interval '1 year', ad.end_day)   AS end_day
FROM   ad
     , generate_series(date_trunc('year', start_day::timestamp)  -- cast to ts here!
                     , date_trunc('year', end_day::timestamp)
                     , interval '1 year') y;

Note that date_trunc() returns timestamptz for timestamptz input and timestamp for timestamp input. For date input it defaults to timestamptz. Since you seem to be ignoring time zones, cast the date to timestamp explicitly (start_day::timestamp).

Result:

 id | year |      start_day      |       end_day
----+------+---------------------+---------------------
  1 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
  1 | 2004 | 2004-01-01 00:00:00 | 2004-10-01 00:00:00
  2 | 2003 | 2003-06-07 00:00:00 | 2003-06-08 00:00:00
  3 | 2003 | 2003-06-07 00:00:00 | 2003-10-01 00:00:00
  4 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
  4 | 2004 | 2004-01-01 00:00:00 | 2005-01-01 00:00:00
  4 | 2005 | 2005-01-01 00:00:00 | 2006-01-01 00:00:00
  4 | 2006 | 2006-01-01 00:00:00 | 2006-10-01 00:00:00

If you create a new table from the result, I suggest (id, year) as primary key.

Aside: This is not an operator: ||'_'|| (nor a cute little face). It's 2 concatenation operators || and a string literal '_'.

Best Answer

Related Solutions

PostgreSQL – Multi-Valued Range Subtraction

PostgreSQL – How to Split Up Interval into Year Slices

Solution

Related Question