Here's a simple example schema:
CREATE TABLE parents (
parent_id INT,
type text null,
PRIMARY KEY (parent_id)
);
CREATE TABLE children (
child_id INT,
parent_id INT,
name text,
PRIMARY KEY (child_id)
);
INSERT INTO parents (parent_id, type) VALUES (1, null);
INSERT INTO children (child_id, parent_id, name) VALUES (1, 1, 'foo');
This query works:
SELECT child_id
FROM children
JOIN parents USING (parent_id)
GROUP BY child_id, parent_id
If I change that to SELECT child_id, name
it still works, as it should because name
is functionally dependent on child_id
and child_id
is in the GROUP BY clause.
Now if I change it to SELECT child_id, type
I think it should still work, because type
is functionally dependent on parent_id
and parent_id
is in the GROUP BY clause. But instead I get an error message:
column "parents.type" must appear in the GROUP BY clause or be used in an aggregate function
What's the difference? Does it have anything to do with the JOIN?
Best Answer
I found out what is the difference.
When I write
Postgres seems to assume I mean
GROUP BY child_id, children.parent_id
and even though those two are guaranteed to be identical by the JOIN, they don't seem to count as functionally dependent. If instead I writePostgres recognizes the functional dependency and allows
type
to be in the SELECT expression.