Postgresql – Converting rows to columns

pivotpostgresqlpostgresql-9.1

I have data like following:

    created_at       |    status    
---------------------+-------------
 2016-04-05 1:27:15  | info
 2016-04-05 3:27:15  | info
 2016-04-05 5:27:15  | warn
 2016-04-05 10:27:15 | info
 2016-04-05 11:27:15 | warn

With this data, I want to convert like as follows:

 status  | 2016-04-05 1:00:00 | 2016-04-05 4:00:00 | 2016-04-05 8:00:00 | 2016-04-05 12:00:00
---------+--------------------+--------------------+--------------------+-------------------
 info    | 1                  | 1                  | 0                  | 1
 warn    | 0                  | 0                  | 1                  | 1                  

Can anyone suggest the best way to do this?

Best Answer

Assuming 2016-04-05 0:27:15 instead of 2016-04-05 1:27:15 in the underlying table, the question would make more sense to me:

CREATE TABLE tbl (created_at timestamp, status text);
INSERT INTO tbl VALUES
  ('2016-04-05 00:27:15', 'info')
, ('2016-04-05 03:27:15', 'info')
, ('2016-04-05 05:27:15', 'warn')
, ('2016-04-05 10:27:15', 'info')
, ('2016-04-05 11:27:15', 'warn');

The logic would be to count events that happened up to and excluding the next bound. This fits the often overlooked function width_bucket() perfectly. To be precise, it requires the variant with arbitrary bounds (since there is no regular pattern in the OP's bounds) introduced with Postgres 9.5. Explanation straight from the manual:

width_bucket(operand anyelement, thresholds anyarray)

return the bucket number to which operand would be assigned given an array listing the lower bounds of the buckets; returns 0 for an input less than the first lower bound; the thresholds array must be sorted, smallest first, or unexpected results will be obtained

For regular buckets you can use another variant that's available in Postgres 9.1 as well.
Combine it with crosstab() re-using the same bounds as column names (the rest of the query works with Postgres 9.1):

SELECT * FROM crosstab(
 $$SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'  -- exclude later rows
   GROUP  BY 1, 2
   ORDER  BY 1, 2$$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, "2016-04-05 01:00" int
                     , "2016-04-05 04:00" int
                     , "2016-04-05 08:00" int
                     , "2016-04-05 12:00" int);

Result:

 status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                  |  1
 warn   |                  |                  |                1 |  1

The second crosstab parameter ('SELECT generate_series(0,3)') is a query string when executed returning one row for every target column. Every value not found on either side - not in the raw data or not generated by the 2nd parameter - is simply ignored.

Basics for crosstab():

Replace NULL with 0

If you need 0 instead of NULL in the result, fix with COALESCE(), but that's merely a cosmetic problem:

SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
FROM   crosstab(
 $$SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'
   GROUP  BY 1, 2
   ORDER  BY 1, 2$$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int);

Result:

 status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                0 |  1
 warn   |                0 |                0 |                1 |  1

Adding totals

To add totals per status use the new GROUPING SETS in Postgres 9.5+

SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
     , COALESCE(t4, 0) AS total
FROM   crosstab(
 $$SELECT status, COALESCE(slot, -1), ct  -- special slot for totals
   FROM  (
      SELECT status
           , width_bucket(created_at, '{2016-04-05 01:00
                                      , 2016-04-05 04:00
                                      , 2016-04-05 08:00
                                      , 2016-04-05 12:00}'::timestamp[]) AS slot
           , count(*)::int AS ct
      FROM   tbl
      WHERE  created_at < '2016-04-05 12:00'
      GROUP  BY GROUPING SETS ((1, 2), 1)  -- add totals per status
      ORDER  BY 1, 2
      ) sub$$
 , 'VALUES (0), (1), (2), (3), (-1)'  -- switched to VALUES for more sophisticated series
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int, t4 int);

Result like above, plus:

...  | total
... -+-------
...  |     3
...  |     2

Note that total includes all rows not excluded before aggregation, even if filtered by crosstab().

This is in reply to @VĂ©race's request in the comments rather than to the unclear question.