PostgreSQL – Fastest Way to Count Date Ranges Covering Each Date

functionsjoin;postgresqlpostgresql-9.4

I have a table (in PostgreSQL 9.4) that looks like this:

CREATE TABLE dates_ranges (kind int, start_date date, end_date date);
INSERT INTO dates_ranges VALUES 
    (1, '2018-01-01', '2018-01-31'),
    (1, '2018-01-01', '2018-01-05'),
    (1, '2018-01-03', '2018-01-06'),
    (2, '2018-01-01', '2018-01-01'),
    (2, '2018-01-01', '2018-01-02'),
    (3, '2018-01-02', '2018-01-08'),
    (3, '2018-01-05', '2018-01-10');

Now I want to calculate for the given dates and for every kind, into how many rows from dates_ranges each date falls. Zeros could be possibly omitted.

Desired result:

+-------+------------+----+
|  kind | as_of_date |  n |
+-------+------------+----+
|     1 | 2018-01-01 |  2 |
|     1 | 2018-01-02 |  2 |
|     1 | 2018-01-03 |  3 |
|     2 | 2018-01-01 |  2 |
|     2 | 2018-01-02 |  1 |
|     3 | 2018-01-02 |  1 |
|     3 | 2018-01-03 |  1 |
+-------+------------+----+

I've come up with two solutions, one with LEFT JOIN and GROUP BY

SELECT
kind, as_of_date, COUNT(*) n
FROM
    (SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates
LEFT JOIN
    dates_ranges ON dates.as_of_date BETWEEN start_date AND end_date
GROUP BY 1,2 ORDER BY 1,2

and one with LATERAL, which is slightly faster:

SELECT
    kind, as_of_date, n
FROM
    (SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates,
LATERAL
    (SELECT kind, COUNT(*) AS n FROM dates_ranges WHERE dates.as_of_date BETWEEN start_date AND end_date GROUP BY kind) ss
ORDER BY kind, as_of_date

I'm wondering is it any better way to write this query? And how to include pairs date-kind with 0 count?

In reality there is a few distinct kinds, period of up to five years (1800 dates), and ~30k rows in dates_ranges table (but it could grow significantly).

There are no indexes. To be precise in my case it's a result of subquery, but I've wanted to limit question to one issue, so it's more general.

Best Answer

The following query also works if "missing zeros" are OK:

select *
from (
  select
    kind,
    generate_series(start_date, end_date, interval '1 day')::date as d,
    count(*)
  from dates_ranges
  group by 1, 2
) x
where d between date '2018-01-01' and date '2018-01-03'
order by 1, 2;

but it's not any faster than the lateral version with the small dataset. It might scale better though, as no join is required, but the above version aggregates over all the rows, so it may lose out there again.

The following query tries to avoid unnecessary work by removing any series that don't overlap anyway:

select
  kind,
  generate_series(greatest(start_date, date '2018-01-01'), least(end_date, date '2018-01-03'), interval '1 day')::date as d,
  count(*)
from dates_ranges
where (start_date, end_date + interval '1 day') overlaps (date '2018-01-01', date '2018-01-03' + interval '1 day')
group by 1, 2
order by 1, 2;

-- and I got to use the overlaps operator! Note that you have to add interval '1 day' to the right as the overlaps operator considers time periods to be open on the right (which is fairly logical because a date is often considered to be a timestamp with time component of midnight).

Related Solutions

Combining Data from Multiple Tables in MySQL

I splitted the problem into two basic SELECTs, one for members and the another one for guests and I join both later. MySql's GROUP_CONCAT aggregate funtion is very useful here too.

http://sqlfiddle.com/#!9/58786/1

SELECT 
TT1.day as day
,TT1.booking_id as booking_id
,TT1.member_id as member_id
,CASE 
  WHEN TT2.guest_ids IS NULL 
  THEN 0 
  ELSE TT2.guest_ids 
  END as guest_id
,TT1.start_date as start_date
,TT1.end_date as end_date
,CASE 
  WHEN TT2.guest_ids IS NULL
  THEN TT1.member_name
  ELSE CONCAT(TT1.member_name, ', ', TT2.guests_name)
  END as names
FROM
(
SELECT 
Min(day) as day
, booking_id
, member_id
, Members.id
, CONCAT(first_name,' ',last_name) as member_name
, start_date
, end_date   
FROM Bookings
INNER JOIN Members
ON Bookings.member_id = Members.id
WHERE member_id <> 0
GROUP BY booking_id, member_id, guest_id, start_date, end_date
) AS TT1
LEFT JOIN
(
SELECT 
booking_id
, GROUP_CONCAT(guest_id SEPARATOR ', ') as guest_ids
, GROUP_CONCAT(CONCAT(first_name, ' ', last_name) SEPARATOR ', ') as guests_name
FROM
(
SELECT
booking_id
, guest_id
FROM Bookings
WHERE guest_id <> 0
GROUP BY booking_id, guest_id
) AS T1
INNER JOIN Guests
ON T1.guest_id = Guests.id
GROUP BY booking_id
) AS TT2
ON TT1.booking_id = TT2.booking_id
ORDER BY TT1.day, TT1.booking_id

postgresql – Running Complex Queries for Date Ranges

I would use select distinct on instead of window function, then just join the days.

select 
    distinct on (date, client_id) date, 
    id 
from orders
inner join generate_series('2015-07-18'::date, '2015-07-19'::date, '1 day') date
  on start_date <= date and (end_date is null or date <= end_date)
order by date, client_id, order_type desc

http://sqlfiddle.com/#!15/5a420/16/0

I can elaborate more if something is not clear.

Best Answer

Related Solutions

Combining Data from Multiple Tables in MySQL

postgresql – Running Complex Queries for Date Ranges

Related Question