Postgresql – Converting rows to columns

pivotpostgresqlpostgresql-9.1

I have data like following:

    created_at       |    status    
---------------------+-------------
 2016-04-05 1:27:15  | info
 2016-04-05 3:27:15  | info
 2016-04-05 5:27:15  | warn
 2016-04-05 10:27:15 | info
 2016-04-05 11:27:15 | warn

With this data, I want to convert like as follows:

 status  | 2016-04-05 1:00:00 | 2016-04-05 4:00:00 | 2016-04-05 8:00:00 | 2016-04-05 12:00:00
---------+--------------------+--------------------+--------------------+-------------------
 info    | 1                  | 1                  | 0                  | 1
 warn    | 0                  | 0                  | 1                  | 1

Can anyone suggest the best way to do this?

Best Answer

Assuming 2016-04-05 0:27:15 instead of ~~2016-04-05 1:27:15~~ in the underlying table, the question would make more sense to me:

CREATE TABLE tbl (created_at timestamp, status text);
INSERT INTO tbl VALUES
  ('2016-04-05 00:27:15', 'info')
, ('2016-04-05 03:27:15', 'info')
, ('2016-04-05 05:27:15', 'warn')
, ('2016-04-05 10:27:15', 'info')
, ('2016-04-05 11:27:15', 'warn');

The logic would be to count events that happened up to and excluding the next bound. This fits the often overlooked function width_bucket() perfectly. To be precise, it requires the variant with arbitrary bounds (since there is no regular pattern in the OP's bounds) introduced with Postgres 9.5. Explanation straight from the manual:

width_bucket(operand anyelement, thresholds anyarray)
return the bucket number to which operand would be assigned given an array listing the lower bounds of the buckets; returns 0 for an input less than the first lower bound; the thresholds array must be sorted, smallest first, or unexpected results will be obtained

For regular buckets you can use another variant that's available in Postgres 9.1 as well.
Combine it with crosstab() re-using the same bounds as column names (the rest of the query works with Postgres 9.1):

SELECT * FROM crosstab(
 $$SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'  -- exclude later rows
   GROUP  BY 1, 2
   ORDER  BY 1, 2$$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, "2016-04-05 01:00" int
                     , "2016-04-05 04:00" int
                     , "2016-04-05 08:00" int
                     , "2016-04-05 12:00" int);

Result:

 status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                  |  1
 warn   |                  |                  |                1 |  1

The second crosstab parameter ('SELECT generate_series(0,3)') is a query string when executed returning one row for every target column. Every value not found on either side - not in the raw data or not generated by the 2nd parameter - is simply ignored.

Basics for crosstab():

PostgreSQL Crosstab Query

Replace NULL with 0

If you need 0 instead of NULL in the result, fix with COALESCE(), but that's merely a cosmetic problem:

SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
FROM   crosstab(
 $$SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'
   GROUP  BY 1, 2
   ORDER  BY 1, 2$$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int);

Result:

 status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                0 |  1
 warn   |                0 |                0 |                1 |  1

Adding totals

To add totals per status use the new GROUPING SETS in Postgres 9.5+

SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
     , COALESCE(t4, 0) AS total
FROM   crosstab(
 $$SELECT status, COALESCE(slot, -1), ct  -- special slot for totals
   FROM  (
      SELECT status
           , width_bucket(created_at, '{2016-04-05 01:00
                                      , 2016-04-05 04:00
                                      , 2016-04-05 08:00
                                      , 2016-04-05 12:00}'::timestamp[]) AS slot
           , count(*)::int AS ct
      FROM   tbl
      WHERE  created_at < '2016-04-05 12:00'
      GROUP  BY GROUPING SETS ((1, 2), 1)  -- add totals per status
      ORDER  BY 1, 2
      ) sub$$
 , 'VALUES (0), (1), (2), (3), (-1)'  -- switched to VALUES for more sophisticated series
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int, t4 int);

Result like above, plus:

...  | total
... -+-------
...  |     3
...  |     2

Note that total includes all rows not excluded before aggregation, even if filtered by crosstab().

This is in reply to @Vérace's request in the comments rather than to the unclear question.

Related Solutions

SQL unpivoting multiple rows/columns, but keeping the rows grouped together, and in the same order they were selected

This uses the same basic technique as jonearles ⁽⁺¹⁾ (since removed) but eliminates the PIVOT and would require only one function that turns the string into a CASE statement.

SELECT substr(
      MIN (
         CASE WHEN Blank=1 THEN '1 Blank' 
              WHEN Error=1 THEN '2 Error' 
              WHEN InProgress=1 THEN '3 InProgress' 
              WHEN Completed=1 THEN '4 Completed' 
         END
         )
   ,3) state_code
FROM m_object mo
JOIN m_form mf ON mo.id = mf.id
WHERE mo.v = :2 AND mf.formlayout_id = :1;

I assume the data looks something like this:

drop table m_object;
create table m_object as 
   (select 10 id, 0 blank, 0 error, 0 inprogress, 1 completed, 99 v from dual);
insert into m_object values (11,1,0,0,0,99);
insert into m_object values (12,0,0,0,1,99);
insert into m_object values (13,0,1,1,0,99);

drop table m_form;
create table m_form as (select 10 id, 20 formlayout_id from dual);
insert into m_form values (11,21);
insert into m_form values (12,22);
insert into m_form values (13,20);

Here is what the function might look like if it were done in PL/SQL. It was just thrown together and is only meant to illustrate building the SQL statement and is not meant to be an example of good coding.

set serveroutput on format wrapped

DECLARE
vPassed Varchar2(500) := 'Blank,Error,InProgress,Completed';

   Function MakeCase(pOriginal In Varchar2) Return Varchar2 Is
      vBuilt     Varchar2(500);
      vWord      Varchar2(100);
      vChar      Char(1);
      vWordCount Number(1) := 1;
   Begin
      For vLoop In 1..Length(pOriginal) Loop
         vChar := substr(pOriginal,vLoop,1);
         If (vChar = ',') Then
            If (vBuilt IS NULL) Then
               vBuilt := 'CASE WHEN ';
            End If;
            vBuilt := vBuilt || vWord || '=1 THEN '''
               || to_char(vWordCount,'FM0') || ' ' || vWord || ''' WHEN ';
            vWord := '';
            vWordCount := vWordCount + 1;            
         Else                  
            vWord := vWord || vChar;            
         End If;
      End Loop;
      vBuilt := vBuilt || vWord || '=1 THEN ''' 
         || to_char(vWordCount,'FM0') || ' ' || vWord || ''' ';
      vBuilt := vBuilt || ' END ';
      Return vBuilt;
   End;

BEGIN
   vPassed := MakeCase(vPassed);
   DBMS_Output.Put_Line(vPassed);

   --Use vPassed here to build SQL statement.
END;
/

MySQL Rows to Columns to HBase (using Sqoop)

If you are interested in returning all the values as columns, you need to be try something very adventurous. First look at your query

SELECT 
    CONCAT(id, '-', date),
    MAX(IF(`num` = 0, avg, NULL)) num0 
FROM table 
GROUP BY 
    id, 
    date;

If would be a big mess to get MySQL to execute it as a query by giving MySQL the query with each column formulated.

Perhaps you can get MySQL to concatenate the column values using GROUP_CONCAT. That function was designed for aggregation (or aggravation if you are the actual developer). You can take all the num values can display it as a column-separated listed of numbers like this:

SELECT 
    CONCAT(id, '-', date),
    GROUP_CONCAT(IF(`num` = 0, avg, 0))) numlist
FROM table 
GROUP BY 
    id, 
    date;

You can also change the list to be delimited by pipes instead of commas like this:

SELECT 
    CONCAT(id, '-', date),
    GROUP_CONCAT(IF(`num` = 0, avg, 0)) SEPARATOR '|') numlist
FROM table 
GROUP BY 
    id, 
    date;

The default maximum length of a GROUP_CONCAT is 1024.

You need to change that max length in the session using this:

SET group_concat_max_len = 10240;

before you issue your query.