Postgresql – 2 Level pivot using Postgresql

pivotpostgresql

I have a table whose schema along with data (table_name : raw_data) appears to be this :

name | category | clear_date |

A    | GOOD     | 2020-05-30 |
A    | GOOD     | 2020-05-30 |
A    | GOOD     | 2020-05-30 |
A    | GOOD     | 2020-05-30 |
A    | BAD      | 2020-05-30 |
A    | BAD      | 2020-05-30 |

Now if I perform a "groupby" operation using the following statement :

 SELECT name, category, date(clear_date), count(clear_date)
 FROM raw_data
 GROUP BY name, category, date(clear_date)
 ORDER BY name

I get the following answer :

  name | caetgory | date      | count |
  A    | GOOD     |2020-05-30 |  4    |
  A    | BAD      |2020-05-30 |  1    |
  A    | BAD      |2020-05-31 |  1    |

IN order to produce the pivot in following format :

 name | category | 2020-05-30 | 2020-05-31 |
 A    | GOOD     | 4          |   NULL     | 
 A    | BAD      | 1          |   1        |

I am using the following query :

 select * from crosstab (
    'select name, category, date(clear_date), count(clear_date) from raw_data group by name, category, date(clear_date) order by 1,2,3',
    'select distinct date(clear_date) from raw_data order by 1'
    )
    as newtable (
    node_name varchar, alarm_name varchar, "2020-05-30" integer, "2020-05-31" integer
)
ORDER BY name

But I am getting results as follows :

    name | category | 2020-05-30 | 2020-05-31 |
    A    | BAD      |    4       | 1          |

Can anyone please try to suggest how can i achieve the result mentioned above. It appears crosstab removes the duplicate entry of A automatically.

Best Answer

For crosstab(), the "row_name" (cagetory in your example) must come before "extra" columns (name in your example).

Assuming name is indeed an "extra" column like the sample data suggests, this would do it:

SELECT name, category, "2020-05-30", "2020-05-31"
FROM   crosstab (
   'select category, name, date(clear_date), count(clear_date)::int from raw_data group
    group by 1,2,3 order by 1 DESC,2,3',
   'select distinct date(clear_date) from raw_data order by 1'
   )
   AS ct(category text, name text, "2020-05-30" integer, "2020-05-31" integer)
ORDER  BY name;

See:

I left the 2nd function parameter 'select distinct date(clear_date) from raw_data order by 1' unchanged. But I would provide constants there, as you need to adjust the column definition list to match anyway.