Postgresql – Calculate Difference in Overlapping Time in PostgreSQL / SSRS

gaps-and-islandspostgresqlssrstime

I'm using PostgreSQL 9.2 and SSRS 3.0. I'm trying to calculate the difference in non-overlapping time entries in a timesheet.

My data looks like this:

data

I would like this to return:

results

I can get the overlapping minutes 635, but am having trouble with the not overlapping, 605 minutes.

Best Answer

Working with this setup:

CREATE TABLE tbl (
   tbl_id     serial PRIMARY KEY
,  user_id    text NOT NULL
,  start_time timestamp NOT NULL  -- if values could be NULL ...
,  end_time   timestamp NOT NULL  -- ... it would be more complicated
);    

Total duration without redundancy

Your numbers indicate you just want the total duration without counting same ranges of time multiple times. Not trivial, but can be done with pure SQL rather efficiently:

SELECT sum(duration) AS total_duration_without_redundancy
FROM  (
   SELECT island, max(end_time) - min(start_time) AS duration
   FROM  (
      SELECT start_time, end_time
           , count(gap) OVER (ORDER BY rn) AS island
      FROM  (
         SELECT start_time, end_time
              , (start_time > max(end_time) OVER w) OR NULL AS gap
              , row_number() OVER w AS rn
         FROM   tbl
         -- WHERE  user_id = 'User1'  -- just for a given user?
         WINDOW w AS (ORDER BY start_time, end_time DESC  -- longest range 1st
                      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
         ) sub1
      ) sub2
   GROUP  BY island
   ) sub3;

Explanation:

  • Beginning at subquery sub1, first order rows by start_time. While being at it, I actually use ORDER BY start_time, end_time DESC hoping to help performance a bit. But that's unimportant.

  • The custom frame definition (ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) allows to compute the maximum end_time for the previous row with the window aggregate function max() with . If the current start_time is later, we have a gap.

  • In sub2 count the gaps to form islands. (All rows with the same count of gaps live on the same island (can be aggregated to one gap-less range).

  • The rest is simple, in sub3 compute the size of each island (duration), and get the total in the outer SELECT.

Related:

Only ranges without overlap

non-overlapping time entries like you wrote, can be interpreted in various ways. Excluding all ranges that overlap (consider only ranges of time that are covered exactly once) is hard to solve with pure SQL.

The special difficulty is that the "difference" of two ranges can produce two disjoint ranges if the one is a sub-range of the other, like the Postgres manual states for range functions:

The union and difference operators will fail if the resulting range would need to contain two disjoint sub-ranges, as such a range cannot be represented.

I would therefore solve the problem procedurally. Iterate though rows ordered by start_time And compare each range to the remaining hopeful range from the last round.

This plpgsql function sums all non-overlapping time ranges (or parts of ranges):

CREATE OR REPLACE FUNCTION f_sum_no_overlap(OUT nol interval) AS
$func$
DECLARE
   a1 timestamp;  -- start and ...
   z1 timestamp;  -- ... end of current row
   a0 timestamp;  -- remaining candidate range ...
   z0 timestamp;  -- ... from previous row
   _id int;
BEGIN
nol := '0:0';     -- init!

FOR _id, a1, z1 IN 
   SELECT tbl_id, start_time, end_time
   FROM   tbl
-- WHERE  user_id = 'User1'  -- or just for a given user?
   ORDER  BY start_time, end_time DESC  -- better take longest range first
LOOP
   IF    a1 >= z0 THEN               -- range starts later
   --       a0------z0
   --                  a1--z1
      nol := nol + (z0 - a0);      
      a0 := a1;
      z0 := z1;

   ELSIF a1 >= a0 AND z1 >= z0 THEN  -- range overlaps
   --       a0------z0
   --            a1--z1
      nol := nol + (a1 - a0);
      a0 := z0;
      z0 := z1;

   ELSIF a1 >= a0 THEN               -- AND z1 < z0 - follows logically: range included
   --       a0-------z0
   --        a1--z1
      nol := nol + (a1 - a0);
      a0 := z1;  -- z0 unchaged

   ELSIF z1 <= a0 THEN               -- ignore redundant rest
   --       a0-------z0
   -- a1--z1
      -- do nothing

   ELSIF z1 <= z0 THEN               -- shrink remaining rest
   --       a0-------z0
   --     a1--z1
      a0 := z1;  -- z0 unchaged
   ELSIF z1 > z0 THEN                -- new remaining rest
   --       a0-------z0
   --     a1------------z1
      a0 := z0;
      z0 := z1; -- z0 unchaged

   ELSE                              -- only for 1st iteration: a0 & z0 are NULL
      a0 := a1;  z0 := z1;
   END IF;

   RAISE NOTICE '_id: %, nol: %, a0: %, z0:%', _id, nol, a0, z0;  -- to debug
END LOOP;

nol := nol + (z0 - a0);  -- rest from last row

END
$func$  LANGUAGE plpgsql;

SQL Fiddle
db<>fiddle here

Should be fast, too, because a single scan of the table (or index) is enough.

I added some basic ASCII art to visualize IF cases. That should help to understand.