Postgresql – Querying sums of grouped consecutive rows in PostgreSQL 9

postgresqlwindow functions

I have data about people traveling in different countries like this:

country | begintimestamp      | distance    

Germany | 2015-01-01 00:00:00 | 100
Germany | 2015-01-01 01:12:13 | 30
France  | 2015-01-01 02:13:14 | 40
France  | 2015-01-01 03:14:15 | 20
Spain   | 2015-01-01 04:15:16 | 10
France  | 2015-01-01 05:16:17 | 30
France  | 2015-01-01 05:17:18 | 5
Germany | 2015-01-01 06:18:19 | 3

What I need is to be able to receive a result like this – the distance of consecutive rows summed with the earliest begintimestamp:

country | begintimestamp      | distance

Germany | 2015-01-01 00:00:00 | 130  // 100+30, the distance of two first rows summed.
France  | 2015-01-01 02:13:14 | 60   // 40+20
Spain   | 2015-01-01 04:15:16 | 10   // 
France  | 2015-01-01 05:16:17 | 35   // 30+5
Germany | 2015-01-01 06:18:19 | 3

I've tried to play around with PG window functions but have not been able to come up with anything that would lead me closer to the result.

Best Answer

select min(country) as country,
       min(begintimestamp) as first_begin_ts, 
       sum(distance) as distance
from (
  select t1.*, 
         sum(group_flag) over (order by begintimestamp) as grp
  from (
      select *,
             case
                when lag(country) over (order by begintimestamp) = country then null
                else 1
              end as group_flag
      from travel
  ) t1
) t2
group by grp
order by first_begin_ts;

The inner most query (alias t1) creates a number each time the country changes). The second level query (alias t2) then does a running sum on those flags, which essentially gives each consecutive set of countries a different number. The outer most query then groups by that number and sums the distance. The min(country) is necessary to make the group by operator happy, but as all rows with the same grp have the same country anyway, it doesn't matter.

SQLFiddle: http://sqlfiddle.com/#!15/fe341/1