PostgreSQL – How to Use GROUP BY on View Queries

postgresql

PostgreSQL 9.5.16

I'm writing a query which contains several aggregate columns and many ungrouped columns. I've condensed the core logic.

SELECT min(table_b.aggregate_column),
table_a1_key,
table_a2_key,
table_a1_column
FROM api.view_a
JOIN api.table_b USING (table_a2_column)
GROUP BY table_a1_key, table_a2_key;

view_a joins table_a1 and table_a2 and exposes table_a1.table_a1_key + table_a2.table_a2_key. But I get this error when executing:

ERROR:  column "view_a.table_a1_column" must appear in the GROUP BY clause or be used in an aggregate function

Am I required to include other columns in the GROUP BY clause, even though they are functionally dependent on my grouped columns?

From docs:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

Best Answer

You are right that table_a1_column is functionally dependent on the other columns, but PostgreSQL isn't smart enough to deduce that.

If view_a were a regular table, PostgreSQL would know that the columns in the GROUP BY clause contain the primary key, but since it is hidden behind the query rewrite rule of a view, it cannot infer that.

For PostgreSQL there are no functional dependencies between the columns of a view. It might be technically possible to improve that, but it is doubtful whether that would be worth the extra effort during query planning, particularly since the query can be fixed by simply adding all columns to the GROUP BY clause.