Postgresql – Calculate the duration in an interval and outside of events that can last over several days

intervalpostgresqlpostgresql-11query-performance

I need to calculate the total duration of events in a given time interval and on a given day of the week with PostgreSQL 11

For an event on the same day no problem. But over several days I manage to approximate the result with "generate_series".u Is there another way faster and more precise?

Interval In:

Start---------------------------------------End
        E.begin_date ------ E.end_date                         -> E.end_date -  E.begin_date 


Start--------------------------------End
            E.begin_date -------------------E.end_date         -> End - E.begin_date


                    Start--------------------------------End
E.begin_date -----------------------E.end_date                 -> E.end_date - Start

            Start------------------End
E.begin_date-------------------------------------E.end_date    -> End - Start

Interval In on several Days with generate_series

                            Day 1               Day 2 etc....
                             |                     |
            Start-------End  |  Start---------End  |  Start---------End
E.begin_date--------------------------------------------------------------E.end_date   

Interval Out:

                Start--------End 
  E.begin_date ------------------- E.end_date    -> (Start - E.begin_date) + (E.end_date - End) 


Start--------------------------------End
            E.begin_date -------------------E.end_date         -> E.end_date  - End


                    Start--------------------------------End
E.begin_date -----------------------E.end_date                 -> Start - .begin_date

                               Start--------End
E.begin_date--------E.end_date    -> E.end_date - E.begin_date

Interval Out on several Days with generate_series

                            Day 1               Day 2 etc....
                             |                     |
            Start-------End  |  Start---------End  |  Start---------End
E.begin_date--------------------------------------------------------------E.end_date   

This is my sql request:

select * FROM (SELECT  sum(
         case
             when (begin_date AT TIME ZONE 'Europe/Paris')::date = (end_date AT TIME ZONE 'Europe/Paris')::date -- same days
                 then
                 case
                     -- ------time_begin------------------------------------------------------------------------time_end----
                     --                    begin_date------------------------------------------------end_date
                     when (begin_date AT TIME ZONE 'Europe/Paris')::time >= '08:00'::time
                         AND
                          (end_date AT TIME ZONE 'Europe/Paris')::time <= '18:00'::time
                         then (end_date AT TIME ZONE 'Europe/Paris')::time - (begin_date AT TIME ZONE 'Europe/Paris')::time

                     -- ------time_begin------------------------------------------------------------------------time_end----
                     --                                                 begin_date------------------------------------------------end_date
                     when (begin_date AT TIME ZONE 'Europe/Paris')::time >= '08:00'::time
                         AND
                          (begin_date AT TIME ZONE 'Europe/Paris')::time <= '18:00'::time
                         AND
                          (end_date AT TIME ZONE 'Europe/Paris')::time >= '18:00'::time
                         then '18:00'::time - (begin_date AT TIME ZONE 'Europe/Paris') ::time

                     -- -----------------time_begin------------------------------------------------------------------------time_end----
                     --    begin_date-----------------------------------end_date
                     when (end_date AT TIME ZONE 'Europe/Paris')::time >= '08:00'::time
                         AND
                          (end_date AT TIME ZONE 'Europe/Paris')::time <= '18:00'::time
                         AND
                          (begin_date AT TIME ZONE 'Europe/Paris')::time <= '08:00'::time
                         then (end_date AT TIME ZONE 'Europe/Paris')::time - '08:00'::time

                     -- -----------------time_begin-------------------------------------time_end----
                     --    begin_date----------------------------------------------------------------end_date
                     when (begin_date AT TIME ZONE 'Europe/Paris')::time <= '08:00'::time
                         AND
                          (end_date AT TIME ZONE 'Europe/Paris')::time >= '18:00'::time
                         then '18:00'::time - '08:00'::time
                     END
              when (begin_date AT TIME ZONE 'Europe/Paris')::date < (end_date AT TIME ZONE 'Europe/Paris')::date -- on many days
                 then end_date - begin_date
                 (select COUNT(*) * interval '5 min'
                  FROM generate_series(begin_date AT TIME ZONE 'Europe/Paris', end_date AT TIME ZONE 'Europe/Paris', interval '5 min') m
                  where (m::time AT TIME ZONE 'Europe/Paris')  >= '08:00'::time
                    AND (m::time AT TIME ZONE 'Europe/Paris') <= '18:00'::time
                    AND EXTRACT(dow FROM m) + 1 in (2, 3)
                 )

             END) as total_included
            FROM "event"
            WHERE ( (EXTRACT(dow FROM "event"."begin_date" AT TIME ZONE 'Europe/Paris') + 1 in (2, 3) OR
                   EXTRACT(dow FROM "event"."end_date" AT TIME ZONE 'Europe/Paris') + 1 in (2, 3) )
                AND ("event"."begin_date" AT TIME ZONE 'Europe/Paris')::date >= '2019-12-01T00:00:00'::date
                AND ("event"."begin_date" AT TIME ZONE 'Europe/Paris')::date <= '2020-01-01T00:00:00'::date
                AND (
                           ((event.begin_date AT TIME ZONE 'Europe/Paris')::time between '08:00'::time AND '18:00'::time)
                           OR
                           ((event.end_date AT TIME ZONE 'Europe/Paris')::time between '08:00'::time AND '18:00'::time)
                           OR (
                                   ((event.begin_date AT TIME ZONE 'Europe/Paris')::time < '08:00'::time)
                                   AND
                                   ((event.end_date AT TIME ZONE 'Europe/Paris')::time > '18:00'::time))
                       ))
      order by total_included desc) as included,

     (SELECT sum(
         case
             when (begin_date AT TIME ZONE 'Europe/Paris')::date = (end_date AT TIME ZONE 'Europe/Paris')::date -- same day
                 then
                 case
                     -- -----------------------------------------------------------------time_begin---------------------------------time_end----
                     --          begin_date-------------------------------end_date
                     -- or
                     -- -------------------------------time_begin---------------------------------time_end-----------
                     --          begin_date----------------------------------------------------------------------end_date
                     when ((begin_date AT TIME ZONE 'Europe/Paris')::time NOT BETWEEN '08:00'::time AND '18:00'::time)
                              AND
                          ((end_date AT TIME ZONE 'Europe/Paris')::time NOT BETWEEN '08:00'::time AND '18:00'::time)
                        then
                         case
                             when ((begin_date AT TIME ZONE 'Europe/Paris')::time < '08:00'::time) AND ((end_date AT TIME ZONE 'Europe/Paris')::time < '08:00'::time)
                                then end_date::time - begin_date::time
                             when (begin_date AT TIME ZONE 'Europe/Paris')::time > '18:00'::time AND (end_date AT TIME ZONE 'Europe/Paris')::time > '18:00'::time
                                then end_date::time - begin_date::time
                             when (begin_date AT TIME ZONE 'Europe/Paris')::time < '08:00'::time AND (end_date AT TIME ZONE 'Europe/Paris')::time > '18:00'::time
                                then ('08:00'::time - (begin_date AT TIME ZONE 'Europe/Paris')::time) + ((end_date AT TIME ZONE 'Europe/Paris')::time - '18:00'::time)
                         end
                     -- --------------------------------------time_begin---------------------------------time_end----
                     --          begin_date-----------------------------------end_date
                     when ((begin_date AT TIME ZONE 'Europe/Paris')::time NOT BETWEEN '08:00'::time AND '18:00'::time)
                              AND
                          ((end_date AT TIME ZONE 'Europe/Paris')::time BETWEEN '08:00'::time AND '18:00'::time)
                         then '08:00'::time - (begin_date AT TIME ZONE 'Europe/Paris')::time

                      -- ----------time_begin---------------------------------time_end----
                     --                                 begin_date-----------------------------------end_date
                     when ((begin_date AT TIME ZONE 'Europe/Paris')::time BETWEEN '08:00'::time AND '18:00'::time)
                              AND
                          ((end_date AT TIME ZONE 'Europe/Paris')::time NOT BETWEEN '08:00'::time AND '18:00'::time)
                         then (end_date AT TIME ZONE 'Europe/Paris')::time - '18:00'::time
                     end
             when (begin_date AT TIME ZONE 'Europe/Paris')::date < (end_date AT TIME ZONE 'Europe/Paris')::date --many days
                 then
                 (select COUNT(*) * interval '5 min'
                  FROM generate_series(begin_date , end_date, interval '5 min') m
                  where (m AT TIME ZONE 'Europe/Paris')::time  >= '08:00'::time
                    AND (m AT TIME ZONE 'Europe/Paris')::time <= '18:00'::time
                    AND EXTRACT(dow FROM m AT TIME ZONE 'Europe/Paris') + 1 in (2, 3)
                 )

             END) as total_excluded
      FROM "event"
      WHERE ((EXTRACT(dow FROM "event"."begin_date" AT TIME ZONE 'Europe/Paris') + 1 in (2, 3) OR
                   EXTRACT(dow FROM "event"."end_date" AT TIME ZONE 'Europe/Paris') + 1 in (2, 3) )
          AND ("event"."date_creation" AT TIME ZONE 'Europe/Paris')::date >= '2019-12-01T00:00:00'::date
          AND ("event"."date_creation" AT TIME ZONE 'Europe/Paris')::date <= '2020-01-01T00:00:00'::date
          AND  (

                      ((event.begin_date AT TIME ZONE 'Europe/Paris')::time not between '08:00'::time AND '18:00'::time)
                      OR
                      ((event.end_date AT TIME ZONE 'Europe/Paris')::time not between '08:00'::time AND '18:00'::time)
                      OR  (
                              ((event.begin_date AT TIME ZONE 'Europe/Paris')::time > '08:00'::time)
                              AND
                              ((event.end_date AT TIME ZONE 'Europe/Paris')::time < '18:00'::time))
                  ))
      order by total_excluded desc) as excluded

Data Sample for interval IN ('8am', '6pm') for Wednesday Thursday:

|---------|------------------|------------------|------------------|
|   id    |     begin_date   |     end_date     | Result(Duration) |
|---------|------------------|------------------|------------------|
|    1    | 2020-01-01 10:00 | 2020-01-01 12:00 |     02:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    2    | 2020-01-01 10:00 | 2020-01-01 20:00 |     08:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    3    | 2020-01-01 07:00 | 2020-01-01 14:00 |     06:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    4    | 2020-01-01 07:00 | 2020-01-01 19:00 |     10:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    5    | 2020-01-01 08:00 | 2020-01-03 18:00 |     20:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    6    | 2020-01-01 09:00 | 2020-01-05 17:00 |     16:00:00     |
|---------|------------------|------------------|------------------|

Data Sample for interval OUT ('8am', '6pm') for Wednesday Thursday:

|---------|------------------|------------------|------------------|
|   id    |     begin_date   |     end_date     | Result(Duration) |
|---------|------------------|------------------|------------------|
|    1    | 2020-01-01 10:00 | 2020-01-01 12:00 |     00:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    2    | 2020-01-01 10:00 | 2020-01-01 20:00 |     02:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    3    | 2020-01-01 07:00 | 2020-01-01 14:00 |     01:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    4    | 2020-01-01 07:00 | 2020-01-01 19:00 |     02:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    5    | 2020-01-01 07:00 | 2020-01-03 18:00 |     02:00:00     |
|---------|------------------|------------------|------------------|
|---------|------------------|------------------|------------------|
|    6    | 2020-01-01 09:00 | 2020-01-05 17:00 |     02:00:00     |
|---------|------------------|------------------|------------------|

The input parameters are:

  • start date and end date
  • start time and end time
  • a list of the days of the week

To reach a result where we have the total duration of events within the interval, but also the total duration of events outside the interval of hours. All this for a list of days of the given week.

Days of the week means that if an event lasts several days say from Monday to Wednesday, but in the filter there are only Monday and Tuesday, the duration of Wednesday will not be counted.

The current query works but to handle events that take place over several days I use generate_series, which is not very efficient. The question is how to improve this query.

Best Answer

Interpretation of the question

Given that an event starts at begin_date and is finished at end_date, the goal is to map this duration onto days/hours that seem like hours open to business, presumably to compute how much work time each event consumed.

Based on the query provided, the working hours are 08:00-18:00. It's not clear how you determine the working days. Sometimes the condition in the query is EXTRACT(dow FROM m) + 1 in (1), and at other places it's EXTRACT(dow FROM "event"."begin_date" AT TIME ZONE 'Europe/Paris') + 1 in (2) OR EXTRACT(dow FROM "event"."end_date" AT TIME ZONE 'Europe/Paris') + 1 in (2) )

The question says that there is a list of the days of the week in input but I don't see that in the query.

Solution with intersection of ranges

Below is a query that demonstrates how to use ranges to sum your "IN" intervals.

What it does:

  • generate the 08:00-18:00 ranges for each day between the start and end of event
  • filters the days to exclude based on the "day of week" number. If you had a calendar table instead, a join could be used.
  • computes the intersection of these ranges with the ranges of the event
  • sum the durations per event

Query

WITH working_days AS (
select
 id,
 tstzrange(
   begin_date::date + '8 hours'::interval+ n*interval '1 day'
  ,begin_date::date + '18 hours'::interval + n*interval '1 day'
 ) AS ref
from
 (select id, begin_date,
    generate_series(0, extract(days from end_date-begin_date)::int, 1) as n
  from event) AS day_numbers
where date_part('dow', begin_date::date + n*interval '1 day') between 1 and 5
),
ranges as (
 select event.id,
   ref * tstzrange(begin_date,end_date) AS t_inc
  from working_days join event using(id)
)
select id,
 sum(upper(t_inc)-lower(t_inc)) AS total_included
from ranges
group by id order by id
;

Result

With the sample data the result is

 id | total_included 
----+----------------
  1 | 02:00:00
  2 | 08:00:00
  3 | 06:00:00
  4 | 10:00:00
  5 | 30:00:00
  6 | 29:00:00

This matches the question except for id=6, but that might be because the dow filter doesn't match yours.

As for the OUT intervals, if I understand correctly I think it's just a difference, that is OUT = (end_date - begin_date) - IN for each event.