I want to store partial information related to a date. I might know the year and month but not the day. I might know the day and month but not the year. I might know the date lies in an open or closed interval. What are my options for modeling this type of data?
Data type for a fuzzy date
datatypes
Related Solutions
I don't know what the best way necessarily is to store it -- but there's at least a better option than using a varchar(39)
(or varchar(40)
if you needed it signed) ; instead use a decimal(39,0)
. From the mysql docs:
Fixed-Point (Exact-Value) Types
The DECIMAL and NUMERIC types store exact numeric data values. These types are used when it is important to preserve exact precision, for example with monetary data. In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.
MySQL 5.1 stores DECIMAL values in binary format. Before MySQL 5.0.3, they were stored as strings. See Section 11.18, “Precision Math”.
In a DECIMAL column declaration, the precision and scale can be (and usually is) specified; for example:
salary DECIMAL(5,2)
In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point.
Standard SQL requires that DECIMAL(5,2) be able to store any value with five digits and two decimals, so values that can be stored in the salary column range from -999.99 to 999.99.
In standard SQL, the syntax DECIMAL(M) is equivalent to DECIMAL(M,0). Similarly, the syntax DECIMAL is equivalent to DECIMAL(M,0), where the implementation is permitted to decide the value of M. MySQL supports both of these variant forms of DECIMAL syntax. The default value of M is 10.
If the scale is 0, DECIMAL values contain no decimal point or fractional part.
The maximum number of digits for DECIMAL is 65, but the actual range for a given DECIMAL column can be constrained by the precision or scale for a given column. When such a column is assigned a value with more digits following the decimal point than are permitted by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the permissible number of digits.)
It's stored packed, so it'll take up less space than the varchar (18 bytes, if I'm doing my math right), and I'd hope you'd be able to do math on it directly, but I've never tried with that large of a number to see what happens.
No, the interval type supports reduced precision but none of the other date/time types do.
Postgres allows you to roll your own with create type
but unfortunately wont allow contraints to be added to the type which limits it's usefulness in this scenario. The best I can come up with requires you to repeat check constraints on every field where the fuzzy
type is used:
create type preciseness as enum('day', 'month', 'year');
create type fuzzytimestamptz as (ts timestamptz, p preciseness);
create table t( id serial primary key,
fuzzy fuzzytimestamptz
check( (fuzzy).ts is not null
or ((fuzzy).ts is null and (fuzzy).p is not null) ),
check((fuzzy).ts=date_trunc('year', (fuzzy).ts) or (fuzzy).p<'year'),
check((fuzzy).ts=date_trunc('month', (fuzzy).ts) or (fuzzy).p<'month'),
check((fuzzy).ts=date_trunc('day', (fuzzy).ts) or (fuzzy).p<'day') );
insert into t(fuzzy) values (row(date_trunc('year', current_timestamp), 'year'));
insert into t(fuzzy) values (row(date_trunc('month', current_timestamp), 'month'));
insert into t(fuzzy) values (row(date_trunc('day', current_timestamp), 'day'));
select * from t;
id | fuzzy
----+----------------------------------
1 | ("2011-01-01 00:00:00+00",year)
2 | ("2011-09-01 00:00:00+01",month)
3 | ("2011-09-23 00:00:00+01",day)
--edit - an example equality operator:
create function fuzzytimestamptz_equality(fuzzytimestamptz, fuzzytimestamptz)
returns boolean language plpgsql immutable as $$
begin
return ($1.ts, $1.ts+coalesce('1 '||$1.p, '0')::interval)
overlaps ($2.ts, $2.ts+coalesce('1 '||$2.p, '0')::interval);
end;$$;
--
create operator = ( procedure=fuzzytimestamptz_equality,
leftarg=fuzzytimestamptz,
rightarg=fuzzytimestamptz );
sample query:
select *, fuzzy=row(statement_timestamp(), null)::fuzzytimestamptz as equals_now,
fuzzy=row(statement_timestamp()+'1 day'::interval, null)::fuzzytimestamptz as equals_tomorrow,
fuzzy=row(date_trunc('month', statement_timestamp()), 'month')::fuzzytimestamptz as equals_fuzzymonth,
fuzzy=row(date_trunc('month', statement_timestamp()+'1 month'::interval), 'month')::fuzzytimestamptz as equals_fuzzynextmonth
from t;
id | fuzzy | equals_now | equals_tomorrow | equals_fuzzymonth | equals_fuzzynextmonth
----+------------------------------------+------------+-----------------+-------------------+-----------------------
1 | ("2011-01-01 00:00:00+00",year) | t | t | t | t
2 | ("2011-09-01 00:00:00+01",month) | t | t | t | f
3 | ("2011-09-24 00:00:00+01",day) | t | f | t | f
4 | ("2011-09-24 11:45:23.810589+01",) | f | f | t | f
Related Question
- Sql-server – Best pratice to store dates group by months-year key par value
- Sql-server – What are the pros/cons of splitting date and time into separate fields vs. using the datetime data type and storing the date in a single field
- SQL Server – Setting Default Column Type
- SQL Server – Best Smallest Data Type Size for 19 Digit Number
- Database Design – Appropriate Column Types for Dates and Dynamic Pricing Data
Best Answer
I've done this for attorneys in the past. I used an ISO-style date format (yyyy-mm-dd) stored as char(10). Any missing part used question marks.
Values like these are intuitive to people doing data entry, they sort fairly sensibly, and the format can be controlled with CHECK constraints. You lose date and time arithmetic, but when you have unknown dates that usually doesn't matter too much.
That's actually a different kind of information than, say, knowing that something happened on April 1, but not knowing which year. Off the top of my head, you could store four columns.
I'd expect to have to store that kind of information separately from the kind of data I mentioned first. "January 8th in an unknown year" would be particularly troublesome to represent in either an open or closed interval.
The latest release of PostgreSQL (9.2) includes support for range types.