I have a table (in PostgreSQL 9.4) that looks like this:
CREATE TABLE dates_ranges (kind int, start_date date, end_date date);
INSERT INTO dates_ranges VALUES
(1, '2018-01-01', '2018-01-31'),
(1, '2018-01-01', '2018-01-05'),
(1, '2018-01-03', '2018-01-06'),
(2, '2018-01-01', '2018-01-01'),
(2, '2018-01-01', '2018-01-02'),
(3, '2018-01-02', '2018-01-08'),
(3, '2018-01-05', '2018-01-10');
Now I want to calculate for the given dates and for every kind, into how many rows from dates_ranges
each date falls. Zeros could be possibly omitted.
Desired result:
+-------+------------+----+
| kind | as_of_date | n |
+-------+------------+----+
| 1 | 2018-01-01 | 2 |
| 1 | 2018-01-02 | 2 |
| 1 | 2018-01-03 | 3 |
| 2 | 2018-01-01 | 2 |
| 2 | 2018-01-02 | 1 |
| 3 | 2018-01-02 | 1 |
| 3 | 2018-01-03 | 1 |
+-------+------------+----+
I've come up with two solutions, one with LEFT JOIN
and GROUP BY
SELECT
kind, as_of_date, COUNT(*) n
FROM
(SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates
LEFT JOIN
dates_ranges ON dates.as_of_date BETWEEN start_date AND end_date
GROUP BY 1,2 ORDER BY 1,2
and one with LATERAL
, which is slightly faster:
SELECT
kind, as_of_date, n
FROM
(SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates,
LATERAL
(SELECT kind, COUNT(*) AS n FROM dates_ranges WHERE dates.as_of_date BETWEEN start_date AND end_date GROUP BY kind) ss
ORDER BY kind, as_of_date
I'm wondering is it any better way to write this query? And how to include pairs date-kind with 0 count?
In reality there is a few distinct kinds, period of up to five years (1800 dates), and ~30k rows in dates_ranges
table (but it could grow significantly).
There are no indexes. To be precise in my case it's a result of subquery, but I've wanted to limit question to one issue, so it's more general.
Best Answer
The following query also works if "missing zeros" are OK:
but it's not any faster than the
lateral
version with the small dataset. It might scale better though, as no join is required, but the above version aggregates over all the rows, so it may lose out there again.The following query tries to avoid unnecessary work by removing any series that don't overlap anyway:
-- and I got to use the
overlaps
operator! Note that you have to addinterval '1 day'
to the right as the overlaps operator considers time periods to be open on the right (which is fairly logical because a date is often considered to be a timestamp with time component of midnight).