No, the interval type supports reduced precision but none of the other date/time types do.
Postgres allows you to roll your own with create type
but unfortunately wont allow contraints to be added to the type which limits it's usefulness in this scenario. The best I can come up with requires you to repeat check constraints on every field where the fuzzy
type is used:
create type preciseness as enum('day', 'month', 'year');
create type fuzzytimestamptz as (ts timestamptz, p preciseness);
create table t( id serial primary key,
fuzzy fuzzytimestamptz
check( (fuzzy).ts is not null
or ((fuzzy).ts is null and (fuzzy).p is not null) ),
check((fuzzy).ts=date_trunc('year', (fuzzy).ts) or (fuzzy).p<'year'),
check((fuzzy).ts=date_trunc('month', (fuzzy).ts) or (fuzzy).p<'month'),
check((fuzzy).ts=date_trunc('day', (fuzzy).ts) or (fuzzy).p<'day') );
insert into t(fuzzy) values (row(date_trunc('year', current_timestamp), 'year'));
insert into t(fuzzy) values (row(date_trunc('month', current_timestamp), 'month'));
insert into t(fuzzy) values (row(date_trunc('day', current_timestamp), 'day'));
select * from t;
id | fuzzy
----+----------------------------------
1 | ("2011-01-01 00:00:00+00",year)
2 | ("2011-09-01 00:00:00+01",month)
3 | ("2011-09-23 00:00:00+01",day)
--edit - an example equality operator:
create function fuzzytimestamptz_equality(fuzzytimestamptz, fuzzytimestamptz)
returns boolean language plpgsql immutable as $$
begin
return ($1.ts, $1.ts+coalesce('1 '||$1.p, '0')::interval)
overlaps ($2.ts, $2.ts+coalesce('1 '||$2.p, '0')::interval);
end;$$;
--
create operator = ( procedure=fuzzytimestamptz_equality,
leftarg=fuzzytimestamptz,
rightarg=fuzzytimestamptz );
sample query:
select *, fuzzy=row(statement_timestamp(), null)::fuzzytimestamptz as equals_now,
fuzzy=row(statement_timestamp()+'1 day'::interval, null)::fuzzytimestamptz as equals_tomorrow,
fuzzy=row(date_trunc('month', statement_timestamp()), 'month')::fuzzytimestamptz as equals_fuzzymonth,
fuzzy=row(date_trunc('month', statement_timestamp()+'1 month'::interval), 'month')::fuzzytimestamptz as equals_fuzzynextmonth
from t;
id | fuzzy | equals_now | equals_tomorrow | equals_fuzzymonth | equals_fuzzynextmonth
----+------------------------------------+------------+-----------------+-------------------+-----------------------
1 | ("2011-01-01 00:00:00+00",year) | t | t | t | t
2 | ("2011-09-01 00:00:00+01",month) | t | t | t | f
3 | ("2011-09-24 00:00:00+01",day) | t | f | t | f
4 | ("2011-09-24 11:45:23.810589+01",) | f | f | t | f
1st case
You seem to forget the valid_during
range. As your third case suggests, there can be multiple entries per (rec_id, val)
, so you must select the right one:
UPDATE master m
SET valid_on = f_array_sort(m.valid_on || u.valid_on) -- sorted array, see below
FROM updates u
WHERE m.rec_id = u.rec_id
AND m.valid_during @> u.valid_on -- additional check
AND m.val = u.val
AND NOT m.valid_on @> ARRAY[u.valid_on];
I assume the whole possible date range is always covered for each existing rec_id
and valid_during
shall not overlap per rec_id
, or you'd have to do more.
After installing the additional module btree_gist
, add an exclusion constraint to rule out overlapping date ranges if you don't have one, yet:
ALTER TABLE master ADD CONSTRAINT EXCLUDE
USING gist (rec_id WITH =, valid_during WITH &&) -- disallow overlap
The GiST index this is implemented with is also a perfect match for the query. Details:
2nd / 3rd case
Assuming that every date range starts with the smallest date in the (now sorted!) array: lower(m.valid_during) = m.valid_on[1]
. I would enforce that with a CHECK
constraint.
Here we need to create one or two new rows
In the 2nd case it is enough to shrink the range of the old row and insert one new row
In the 3rd case we update the old row with the left half of array and range, insert the new row and finally insert the with the right half of array and range.
Helper functions
To keep it simple I introduce a new constraint: every array is sorted. Use this helper function
-- sort array
CREATE OR REPLACE FUNCTION f_array_sort(anyarray)
RETURNS anyarray LANGUAGE sql IMMUTABLE AS
$$SELECT ARRAY (SELECT unnest($1) ORDER BY 1)$$;
I don't need your helper function arraymin()
any more, but it could be simplified to:
CREATE OR REPLACE FUNCTION f_array_min(anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS
$$SELECT min(a) FROM unnest($1) a$$;
Two more to get the left and right half of an array split at a given element:
-- split left array at given element
CREATE OR REPLACE FUNCTION f_array_left(anyarray, anyelement)
RETURNS anyarray LANGUAGE sql IMMUTABLE AS
$$SELECT ARRAY (SELECT * FROM unnest($1) a WHERE a < $2 ORDER BY 1)$$;
-- split right array at given element
CREATE OR REPLACE FUNCTION f_array_right(anyarray, anyelement)
RETURNS anyarray LANGUAGE sql IMMUTABLE AS
$$SELECT ARRAY (SELECT * FROM unnest($1) a WHERE a >= $2 ORDER BY 1)$$;
Query
This does all the rest:
WITH u AS ( -- identify candidates
SELECT m.id, rec_id, m.val, m.valid_on, m.valid_during
, u.val AS u_val, u.valid_on AS u_valid_on
FROM master m
JOIN updates u USING (rec_id)
WHERE m.val <> u.val
AND m.valid_during @> u.valid_on
FOR UPDATE -- lock for update
)
, upd1 AS ( -- case 2: no overlap, no split
UPDATE master m -- shrink old row
SET valid_during = daterange(lower(u.valid_during), u.u_valid_on)
FROM u
WHERE u.id = m.id
AND u.u_valid_on > m.valid_on[array_upper(m.valid_on, 1)]
RETURNING m.id
)
, ins1 AS ( -- insert new row
INSERT INTO master (rec_id, val, valid_on, valid_during)
SELECT u.rec_id, u.u_val, ARRAY[u.u_valid_on]
, daterange(u.u_valid_on, upper(u.valid_during))
FROM upd1
JOIN u USING (id)
)
, upd2 AS ( -- case 3: overlap, need to split row
UPDATE master m -- shrink to first half
SET valid_during = daterange(lower(u.valid_during), u.u_valid_on)
, valid_on = f_array_left(u.valid_on, u.u_valid_on)
FROM u
LEFT JOIN upd1 USING (id)
WHERE upd1.id IS NULL -- all others
AND u.id = m.id
RETURNING m.id, f_array_right(u.valid_on, u.u_valid_on) AS arr_right
)
INSERT INTO master (rec_id, val, valid_on, valid_during)
-- new row
SELECT u.rec_id, u.u_val, ARRAY[u.u_valid_on]
, daterange(u.u_valid_on, upd2.arr_right[1])
FROM upd2
JOIN u USING (id)
UNION ALL -- second half of old row
SELECT u.rec_id, u.val, upd2.arr_right
, daterange(upd2.arr_right[1], upper(u.valid_during))
FROM upd2
JOIN u USING (id);
SQL Fiddle.
Notes
You need to understand the concept of data-modifying CTEs (writeable CTEs), before you touch this. Judging from the code you provided, you know your way around Postgres.
FOR UPDATE
is to avoid race conditions with concurrent write access. If you are the only user writing to the tables, you don't need it.
I took a piece of paper and drew a timeline so not to get lost in all of this.
Each row is only updated / inserted once, and operations are simple and roughly optimized. No expensive window functions. This should perform well. Much faster than your previous approach in any case.
It would be a bit less confusing if you'd use distinct column names for u.valid_on
and m.valid_on
, which are related but different things.
I compute the right half of the split array in the RETURNING
clause of CTE upd2
: f_array_right(u.valid_on, u.u_valid_on) AS arr_right
, because I need it several times in the next step. This is a (legal) trick to save one more CTE.
As for solutions that don't involve unnesting the master table
: You have to unnest the array valid_on
either way, to split it, at least as long as it's not sorted. Also, your helper function arraymin()
is already unnesting it anyway.
Best Answer
If possible, just store the first value
It doesn't look like you need to store two numbers at all.
If these are like tax years, i.e. "2014/2015 financial year", you only need to consistently store one part.
If you store
2014
, your app knows it's the academic year beginning in 2014. So it can display "2014/2015 Academic Year" with no ambiguity.If you really need a range
If you actually need ranges and your sample data doesn't express that, e.g. you might need to store
2012/2015
, then twoint2
columns would be a good choice. You'd useCHECK
constraints to limit the range to a reasonable value for an academic year and to enforce that the lower bound is less than the higher bound, as @Neil has already showed.Alternately you could use the new-ish and PostgreSQL-specific
int4range
type:per the documentation on range types. This offers you lots more operators at the cost of being fiddlier to work with for basic queries and from client apps.
You can even make a
DOMAIN
over it to confine its range.e.g.
(Before range types I would've suggested a domain over a composite type, but
int4range
is better in every way than a composite type).Most likely all this is unnecessary - if you don't need variable length ranges, i.e. it's always someyear/thenextyear, you should just store the first year.