Postgresql – How to use window functions (“over”) together with crosstab in Postgres

group bypostgresql

Table: application_app

         applied_date          | applied_class | id
-------------------------------+---------------+-----
 2013-03-06 15:59:49.22774-05  | BOSFAL13      |  13
 2013-05-12 16:56:34.749859-04 | BOSFAL13      | 232
 2013-05-26 15:48:11.969723-04 | BOSFAL13      | 359
 2013-06-02 23:02:17.267199-04 | BOSFAL13      | 400
 2013-06-03 12:35:46.90801-04  | BOSFAL13      | 408
 2013-06-08 19:17:42.179707-04 | NYCFAL13      | 449
 2013-06-11 02:04:38.232501-04 | NYCFAL13      | 450
 2013-06-11 13:05:42.509843-04 | NYCFAL13      | 452
 2013-06-12 09:19:21.459897-04 | BOSFAL13      | 454
 2013-06-13 02:38:52.782411-04 | NYCFAL13      | 456
 2013-06-17 19:41:31.813795-04 | NYCFAL13      | 457
 2013-06-18 17:39:04.892315-04 | BOSFAL13      | 459
 2013-06-19 09:05:29.892635-04 | BOSFAL13      | 460
 2013-06-21 09:06:06.380306-04 | NYCFAL13      | 462
 2013-06-22 05:02:31.699312-04 | BOSFAL13      | 463

I'm trying to use crosstab to get an aggregate of applied_class over time (by day).

Current Query

select *

from crosstab(
    'select temp_table.d,  
       applied_class,  
       sum(temp_table.ct) over (order by d) 

    from   
       (
        select count(id) ct, 
               applied_class, 
               to_char(date_trunc($$day$$, applied_date), $$MM-DD$$) d from application_app 
        where applied_class like $$%L13$$ 
        group by applied_class, d 
        order by d
        ) as temp_table
    order by 1, 2'

    ,'VALUES ($$BOSFAL13$$::text), ($$NYCFAL13$$::text), ($$CHIFAL13$$::text), ($$MULTIPLECITIESFALL13$$::text)')  -- end crosstab

as ct ("Day" text, "Boston" int, "New York" int, "Chicago" int, "Any City" int);

This query is very close, but it's increasing the count for each applied_class whenever there is a record for any applied_class:

Results

  Day  | Boston | New York | Chicago | Any Class
-------+--------+----------+---------+-----------
 03-06 |      1 |          |         |
 05-12 |      2 |          |         |
 05-26 |      3 |          |         |
 06-02 |      4 |          |         |
 06-03 |      6 |          |         |
 06-08 |        |        7 | <-- First "New York" record, but adds prev 6 from "Boston"
 06-11 |        |       11 |         |
 06-12 |     12 |          |         |
 06-13 |        |       13 |         |
 06-17 |        |       14 |         |
 06-18 |     16 |          |         |
 06-19 |     18 |          |         |
 06-21 |        |       19 |         |

How can I change this query so that it aggregates the applied_class records by city, instead of adding the previous records from all cities?

Best Answer

You need to add a partition clause to the over():

I would expect

sum(ct) over (partition by applied_class order by d)

to do what you want.