Postgresql – Sum Interval of Dates within same column

datetimepostgresqlsum

How do you best sum the differences of a range of dates in the same column between interleaving rows? I have a Datetime column and want to calculate the difference between Rows. I want the difference in Seconds. This questions is not about how to get a difference between 2 timestamps, but is more focused on how to most efficiently calculate between rows on the same table. In my case each row has a datetime eventype that links 2 rows together logically.

Details Related to how to group the eventtypes of start and end. ( Andriy M's Question)
Starts and Ends "should" be consecutive. If a Start does not have a subsequent end it should be left out of the sum. Moving to the next Start to see if it has a end. Only consecutive Start – End Pairs should be added to the sum of the total seconds.

Working in postgresql 9.x…

Example data in table;

eventtype, eventdate
START, 2015-01-01 14:00
END, 2015-01-01 14:25
START, 2015-01-01 14:30
END, 2015-01-01 14:43
START, 2015-01-01 14:45
END, 2015-01-01 14:49
START, 2015-01-01 14:52
END, 2015-01-01 14:55

Note,
All Start and End Dates will be sequential.

Here's my first attempt. Seems to be working.

SELECT 
-- starts.*
SUM(EXTRACT(EPOCH FROM (eventdate_next - eventdate))) AS duration_seconds
FROM
( 
    WITH x AS (
        SELECT *, dense_rank() OVER (ORDER BY eventdate) AS rnk
        FROM   table
        AND eventdate > '2015-01-01 00:00:00.00'
        AND eventdate < '2016-01-01 23:59:59.59' 
        )
    SELECT x.eventdate, x.eventtype, y.eventdate AS eventdate_next,  y.eventtype AS eventtype_next
    FROM   x
    LEFT   JOIN (SELECT DISTINCT eventdate, eventtype, rnk FROM x) y ON y.rnk = (x.rnk + 1)
    ORDER  BY x.eventdate
) starts
WHERE
eventtype = 'START'   
GROUP BY eventtype 

My first attempt is based on a great example from stackoverflow
Postgres 9.1 – Getting the next value

Note; You can comment the GROUP BY and the SUM and un-comment the starts.* to get a record for each individual duration going into the sum.

Best Answer

You can use the LEAD analytic function to obtain the next row's eventtype and eventdate alongside the current row's data:

SELECT
  eventtype,
  eventdate,
  LEAD(eventtype) OVER (ORDER BY eventdate) AS nexttype,
  LEAD(eventdate) OVER (ORDER BY eventdate) AS nextdate
FROM
  atable
WHERE
      eventdate >= '2015-01-01 00:00:00.00'
  AND eventdate <  '2016-01-01 23:59:59.59'

Using the above query as a derived table, you can filter the output further on eventtype = 'START' AND nexttype = 'END' and get the difference total:

SELECT
  SUM(EXTRACT(EPOCH FROM (nextdate - eventdate))) AS duration_seconds
FROM
  (
    SELECT
      eventtype,
      eventdate,
      LEAD(eventtype) OVER (ORDER BY eventdate) AS nexttype,
      LEAD(eventdate) OVER (ORDER BY eventdate) AS nextdate
    FROM
      atable
    WHERE
          eventdate >= '2015-01-01 00:00:00.00'
      AND eventdate <  '2016-01-01 23:59:59.59'
  ) AS s
WHERE
      eventtype = 'START'
  AND nexttype  = 'END'
;

As a slight variation, you can implement the subquery as a CTE:

WITH cte AS
  (
    SELECT
      eventtype,
      eventdate,
      LEAD(eventtype) OVER (ORDER BY eventdate) AS nexttype,
      LEAD(eventdate) OVER (ORDER BY eventdate) AS nextdate
    FROM
      atable
    WHERE
          eventdate >= '2015-01-01 00:00:00.00'
      AND eventdate <  '2016-01-01 23:59:59.59'
  )
SELECT
  SUM(EXTRACT(EPOCH FROM (nextdate - eventdate))) AS duration_seconds
FROM
  cte
WHERE
      eventtype = 'START'
  AND nexttype  = 'END'
;

This rewrite can have implications for performance, because unlike a derived table, a CTE is materialised in PostgreSQL. Testing should reveal if there is a difference and, if so, which option is better for you.