Postgresql – Postgres Nested WHEN Aggregate Function

aggregateaggregate-filtercountpostgresqlsubquery

In PostgreSQL (version 9.4) I am trying to construct a query to determine in various tranches how many times various numbers appear in the data set between ranges. When I query group-ing by "SettlementPointPrice" the count() function works correctly bucketing prices into individual tranches as I would expect. However, this creates hundreds of rows. I am looking for the ability (sub-query?) to aggregate the sum of count()'s in each bucket into one single row. What is the best way to manage this in SQL?

I am using a statement like this (full SQL below) for each bucket/tranche:

CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 0 AND 10
     THEN count(*) ELSE 0 END AS "DA $0 - $10",

When I group by "SettlementPointPrice" (two prices in this example) to confirm the data counts the buckets correctly. As the table below shows.

Raw data from individual two days:

Row | "SettlementPointPrice" | 0-10 | 11-20 | 21-30
1   | 18                     | 0    | 1     | 0
2   | 22                     | 0    | 0     | 1

However, I am unable to get the aggregated summation by grouping them all together. I assume this is a sub-query?

I would like the result to be as such:

Row | 0-10 | 11-20 | 21-30
1   | 0    | 1     | 1

Full SQL code:

SELECT
  "DA-A"."SettlementPointPrice",
  CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 0 AND 10
       THEN count(*) ELSE 0 END AS "DA $0 - $10",
  CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 11 AND 20
       THEN COUNT(*) ELSE 0 END AS "DA $11 - $20",
  CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 21 AND 30
       THEN COUNT(*) ELSE 0 END AS "DA $21 - $30"
FROM 
  public.da "DA-A", 
  public.rt_aggregate "RT-A"
WHERE 
  "RT-A"."DeliveryDate" = "DA-A"."DeliveryDate" AND
  "RT-A"."SettlementPointName" = "DA-A"."SettlementPointName" AND
  "DA-A"."SettlementPointName" = 'John' AND 
  "DA-A"."DeliveryDate" >= '2015-02-01' AND
  "DA-A"."DeliveryDate" <= '2015-02-20' AND
 ("RT-A"."DeliveryHour" = 14) and 
  date_part('hour', "DA-A"."DeliveryHour") = "RT-A"."DeliveryHour"
GROUP BY
  "DA-A"."SettlementPointPrice",
  "DA-A"."SettlementPointName"

Best Answer

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count(                                     d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
     , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
     , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."SettlementPointName" = 'John'
AND    d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

For absolute performance, is SUM faster or COUNT?

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
     , count(*) FILTER (WHERE d."SettlementPointPrice" <  10.5) AS da_00_10
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
                        AND   d."SettlementPointPrice" <  20.5) AS da_11_20
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
                        AND   d."SettlementPointPrice" <  30.5) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP  BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Return counts for multiple ranges in a single SELECT statement

Related Solutions

Mysql – Selecting minimum value using a subquery

Modifying slightly your second query, will give you both the merchant id and the lowest price (over all products that pass the conditions - I guess that's what you want):

SELECT p.p_m_id, MIN(p_price) AS min_p_price 
FROM tgmp_affiliates ga 
JOIN tgmp_prices p 
    ON ga.a_code = p.p_gtin 
        AND ga.a_code > '' 
JOIN tgmp_merchants m 
    ON m.m_id = p.p_m_id 
WHERE ga.site_id = '34' 
    AND p.site_id = '34' 
    AND ga.a_parent = '25573' 
    AND p.p_type = 'games' 
    AND m.m_hide = 0 
GROUP BY p.p_m_id ;

Then you can join this - as a derived table - to all the tables that you need data from in the results:

SELECT
    m.*, p.*, ga.*                     -- whatever columns you want  
FROM tgmp_affiliates ga 
JOIN tgmp_prices p 
    ON ga.a_code = p.p_gtin 
        AND ga.a_code > '' 
JOIN tgmp_merchants m 
    ON m.m_id = p.p_m_id 
JOIN
      ( SELECT p.p_m_id, MIN(p_price) AS p_price 
        FROM tgmp_affiliates ga 
        JOIN tgmp_prices p 
            ON ga.a_code = p.p_gtin 
                AND ga.a_code > '' 
        JOIN tgmp_merchants m 
            ON m.m_id = p.p_m_id 
        WHERE ga.site_id = '34' 
            AND p.site_id = '34' 
            AND ga.a_parent = '25573' 
            AND p.p_type = 'games' 
            AND m.m_hide = 0 
        GROUP BY p.p_m_id 
      ) AS tmp
    ON  tmp.p_m_id = p.p_m_id 
    AND tmp.p_price = p.p_price
WHERE ga.site_id = '34' 
    AND p.site_id = '34' 
    AND ga.a_parent = '25573' 
    AND p.p_type = 'games' 
ORDER BY p.p_price ;

Postgresql – Returning empty string when string_agg has no records

Use COALESCE to catch and replace NULL values:

SELECT f.name AS foo
     , 'Bazzes: ' || COALESCE(string_agg(b.baz, ', '), '') AS bazzes
FROM   foo f
LEFT   JOIN bar b ON b.fooid = f.id
GROUP  BY 1;

concat() is another convenient option as you found yourself, in particular to concatenate multiple values. I suggest the variant concat_ws() ("with separator"), though, to avoid the trailing space.

concat_ws(' ', 'Bazzes:', string_agg(b.baz, ', ')) AS bazzes

Why `NULL`?

Almost all aggregate function return NULL if all source fields are NULL (no non-null values, to be precise) - count() being the exception for practical reasons. The manual:

It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect, and array_agg returns null rather than an empty array when there are no input rows. The coalesce function can be used to substitute zero or an empty array for null when necessary.

Best Answer

Related Solutions

Mysql – Selecting minimum value using a subquery

Postgresql – Returning empty string when string_agg has no records

Why NULL?

Related Question

Why `NULL`?