Well this answer depends on how your database is designed and licensed.
If you have an Enterprise license, you should take a look at partitioning. This way you may partition your data which will help you in querying those table. If you make a good partitioning, you'll be able to just read a small portion of the table from the disc, instead of querying everything.
I don't know how your application writes to your database. But maybe it's possible to flag the first record of an hour so that you don't need to filter it later on that way?
Another idea is a SQL Agent, which will perform those aggregations in the background for you. Maybe this could be hourly and write all aggregated values to another cumulative table, which could be queried by you even faster.
A couple of suggestions:
Use legal, lower-case, unquoted identifiers to save yourself a lot of confusion. end
(and, to a lesser degree start
) are reserved words.
Your id
column seems to be numeric type like integer
, not text
.
Since you don't need a time component in start_day
and end_day
, the appropriate data type is date
, not timestamp
.
Why concatenate id
and the year for a new "ID"? Add a second column year
if you need it for a PK. Or don't add an additional column at all. It can cheaply be extracted from the new start_day
on the fly. Generally, don't store data redundantly if you can avoid it.
Typically, timestamp
ranges use an inclusive lower bound and an exclusive upper bound. Since timestamps can have fractional digits (up to 6 in Postgres) that is much cleaner. Your input 2003-12-31 23:59:59
would fail for 2003-12-31 23:59:59.123
.
So, your table could look like this:
CREATE TABLE ad (
id int PRIMARY KEY
, start_day date NOT NULL -- *inclusive* lower bound
, end_day date NOT NULL -- *exclusive* upper bound
CHECK (end_day > start_day) -- enforce legal input
);
Proper test values:
INSERT INTO ad(id, start_day, end_day)
VALUES
(1, '2003-06-07', '2004-10-01') -- span 2 years (your example)
, (2, '2003-06-07', '2003-06-08') -- 1 day in same year
, (3, '2003-06-07', '2003-10-01') -- span 1 year
, (4, '2003-06-07', '2006-10-01'); -- span many years
Solution
Use generate_series()
in a LATERAL
join, based on start and end day, truncated to the year with date_trunc()
. This produces one row per year with the new start date. Add a year and you have the new end date. Except for first and last row per id
, where you substitute the proper start / end with GREATEST
and LEAST
respectively. Voilá.
-- CREATE TABLE ad_year AS
SELECT ad.id
, extract('year' FROM y)::int AS year
, GREATEST(y , ad.start_day) AS start_day
, LEAST (y + interval '1 year', ad.end_day) AS end_day
FROM ad
, generate_series(date_trunc('year', start_day::timestamp) -- cast to ts here!
, date_trunc('year', end_day::timestamp)
, interval '1 year') y;
Note that date_trunc()
returns timestamptz
for timestamptz
input and timestamp
for timestamp
input. For date
input it defaults to timestamptz
. Since you seem to be ignoring time zones, cast the date
to timestamp
explicitly (start_day::timestamp
).
Result:
id | year | start_day | end_day
----+------+---------------------+---------------------
1 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
1 | 2004 | 2004-01-01 00:00:00 | 2004-10-01 00:00:00
2 | 2003 | 2003-06-07 00:00:00 | 2003-06-08 00:00:00
3 | 2003 | 2003-06-07 00:00:00 | 2003-10-01 00:00:00
4 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
4 | 2004 | 2004-01-01 00:00:00 | 2005-01-01 00:00:00
4 | 2005 | 2005-01-01 00:00:00 | 2006-01-01 00:00:00
4 | 2006 | 2006-01-01 00:00:00 | 2006-10-01 00:00:00
If you create a new table from the result, I suggest (id, year)
as primary key.
Aside: This is not an operator: ||'_'||
(nor a cute little face). It's 2 concatenation operators ||
and a string literal '_'
.
Best Answer
I am not entirely sure about your objective, but what you show works with legal identifiers. "end" is a reserved word and I wouldn't use "start" either. Reserved words require double-quoting, best avoided to begin with:
Can be much simpler, though:
db<>fiddle here (raising exceptions to display the result)