Postgresql – best way to avoid redundant aggregate functions and/or group by columns

performancepostgresqlpostgresql-9.3query-performancesubquery

Suppose I have two tables:

Foo:

id
baz

Bar:

id
foo_id
boom

So a Foo has many Bars. I frequently find myself in situations where I need to compute an aggregate across the Bars for a given set of Foos but I also want some properties from Foo. The two most straightforward ways of doing this are ugly:

Method #1: Unnecessary Aggregate Functions

select
  foo.id,
  min(foo.baz) as baz,
  min(bar.boom) as min_boom
from
  foo
join
  bar on foo.id = bar.foo_id
group by
  foo.id;

Method #2: Unnecessary Group By Column

select
  foo.id,
  foo.baz,
  min(bar.boom) as min_boom
from
  foo
join
  bar on foo.id = bar.foo_id
group by
  foo.id,
  foo.baz;

This is not so bad when there's only one additional column from Foo besides "id", but if there are many columns that need to be included then the grouping becomes much less efficient. A query like this gets around both these issues, but seems unwieldy:

select
  foo.id,
  foo.baz,
  x.min_boom
from
  foo
join
  (select
    foo_id, 
    min(boom) as min_boom
  from
    bar
  group by
    foo_id) x on x.foo_id = foo.id;

Is there a better way? Platform is Postgres if that matters.

Best Answer

If id is defined as the primary key, you can omit grouping by all the foo columns you want for the output as long as you are grouping by the id. This special case of grouping is in accordance with the current SQL standard and has also been covered in the PostgreSQL manual, starting from version 9.1:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

(Emphasis added.)

So, if foo.id is the PK, this query would be valid:

select
  foo.id,
  foo.baz,
  foo.whatever,
  min(bar.boom) as min_boom
from
  foo
join
  bar on foo.id = bar.foo_id
group by
  foo.id;