PostgreSQL 9.5.16
I'm writing a query which contains several aggregate columns and many ungrouped columns. I've condensed the core logic.
SELECT min(table_b.aggregate_column),
table_a1_key,
table_a2_key,
table_a1_column
FROM api.view_a
JOIN api.table_b USING (table_a2_column)
GROUP BY table_a1_key, table_a2_key;
view_a
joins table_a1
and table_a2
and exposes table_a1.table_a1_key
+ table_a2.table_a2_key
. But I get this error when executing:
ERROR: column "view_a.table_a1_column" must appear in the GROUP BY clause or be used in an aggregate function
Am I required to include other columns in the GROUP BY clause, even though they are functionally dependent on my grouped columns?
From docs:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
Best Answer
You are right that
table_a1_column
is functionally dependent on the other columns, but PostgreSQL isn't smart enough to deduce that.If
view_a
were a regular table, PostgreSQL would know that the columns in theGROUP BY
clause contain the primary key, but since it is hidden behind the query rewrite rule of a view, it cannot infer that.For PostgreSQL there are no functional dependencies between the columns of a view. It might be technically possible to improve that, but it is doubtful whether that would be worth the extra effort during query planning, particularly since the query can be fixed by simply adding all columns to the
GROUP BY
clause.