PostgreSQL – Grouping() Equivalent in PostgreSQL

aggregatepostgresqlsql serverwindow functions

I have a SQL query that runs in MS SQL Server and it has grouping(), rank() and rollup() functions / keywords in it. The query is given below.

How can this be translated to PostgreSQL?

select top 100
    sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin  
   ,i_category  
   ,i_class  
   ,grouping(i_category)+grouping(i_class) as lochierarchy  
   ,rank() over (partition by grouping(i_category)+grouping(i_class)
                             ,case when grouping(i_class) = 0 then i_category end   
                 order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc
                ) as rank_within_parent  
 from  
    ssv_itemv_storev_IJGBAgg  
 where  
    d_year = 2001   
    and s_state in ('TN')  
 group by rollup(i_category,i_class)  
 order by  
    lochierarchy desc  
   ,case when (grouping(i_category)+grouping(i_class) = 0) then i_category end  
   ,rank_within_parent;   

Best Answer

ROLLUP was implemented in Postgres 9.5.


Original answer for Postgres 9.4:

The rollup() in group by rollup(i_category,i_class) causes aggregation in hierarchical steps. It's an extension of the ISO SQL standard in SQL Server that's not implemented in Postgres. For two items, you need three steps in standard SQL - as well as in Postgres:

  1. aggregate total
  2. group by i_category
  3. group by i_category, i_class

The grouping() function comes with this extension and indicates whether a given column is aggregated in each row. It's used here to sort the total and group sums first in the result.

rank() is a standard window function that works the same in Postgres, but we have to rewrite it due to the above adjustments.

top 100 translates to FETCH FIRST 100 ROWS ONLY in standard SQL, which is implemented in Postgres as well as the shorter LIMIT 100 (FETCH FIRST syntax works in SQL Server for 2012+ versions, too.) It's a bit odd to cut off after 100 rows in this aggregate query.

I am using a CTE to reuse the intermediary results in multiple aggregations below. Except for the LIMIT everything should basically work in SQL Server just as well:

WITH cte AS (
   SELECT sum(ss_net_profit) AS sum_profit
        , sum(ss_ext_sales_price) AS sum_price
        , i_category, i_class
   FROM   ssv_itemv_storev_IJGBAgg
   WHERE  d_year = 2001
   AND    s_state = 'TN'
   GROUP  BY i_category, i_class
   )
SELECT sum(sum_profit)/sum(sum_price) AS gross_margin
     , NULL AS i_category, NULL AS i_class
     , 2 AS lochierarchy  
     , 1 AS rank_within_parent
FROM   cte

UNION ALL
(  -- parentheses required!
SELECT sum(sum_profit)/sum(sum_price) AS gross_margin
     , i_category, NULL AS i_class
     , 1 AS lochierarchy  
     , rank() OVER (ORDER BY sum(sum_profit)/sum(sum_price)) AS rank_within_parent
FROM   cte
GROUP  BY i_category
ORDER  BY rank_within_parent, i_category  -- last item = my addition to break ties
)

UNION ALL
(
SELECT sum_profit/sum_price AS gross_margin
     , i_category, i_class
     , 0 AS lochierarchy  
     , rank() OVER (PARTITION BY i_category
                    ORDER BY sum_profit/sum_price) AS rank_within_parent
FROM   cte
ORDER  BY i_category, rank_within_parent, i_class  -- last item = my addition to break ties
)
LIMIT 100;