Lets say I have two tables. One of them references the other in a one-to-many relationship:
CREATE TABLE t (
id SERIAL PRIMARY KEY
, name text
);
INSERT INTO t (name) VALUES ('one'), ('two'), ('three');
CREATE TABLE y (
id SERIAL PRIMARY KEY
, tid INTEGER REFERENCES t
, amount INTEGER
);
INSERT INTO y (tid, amount) VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(3, 1);
I can query data from these tables in order to get t
rows matching a certain condition set on t
and y
:
SELECT row_to_json(t.*) as "t_nogroup" FROM t
JOIN y ON t.id = y.tid
WHERE t.id = 1
AND y.amount > 1;
This yields
t_nogroup
=========
{"id":1,"name":"one"}
{"id":1,"name":"one"}
Which works, yes, but contains duplicates. I can filter out the duplicates by using a GROUP BY
:
SELECT row_to_json(t.*) as "t_group" FROM t
JOIN y ON t.id = y.tid
WHERE t.id = 1
AND y.amount > 1
GROUP BY t.id;
Making:
t_group
=======
{"id":1,"name":"one"}
Great! However, when I try to exchange t
and y
for non-materialized VIEW
s:
CREATE VIEW vt AS SELECT t.* FROM t;
CREATE VIEW vy AS SELECT y.* FROM y;
And run the same queries with the new VIEW
s
SELECT row_to_json(vt.*) as "view_t_nogroup" FROM vt
JOIN vy ON vt.id = vy.tid
WHERE vt.id = 1
AND vy.amount > 1;
SELECT row_to_json(vt.*) as "view_t_group" FROM vt
JOIN vy ON vt.id = vy.tid
WHERE vt.id = 1
AND vy.amount > 1
GROUP BY vt.id;
I then get the error for the second query:
ERROR: column "vt.*" must appear in the GROUP BY clause or be used in an aggregate function
Why does this happen? Whatever logic that saw that *
contained id
when using the table t
should still work for the view vt
, right?
Fiddle link: http://sqlfiddle.com/#!17/26b0e/5
I'm running PostgreSQL 9.6. I'm also aware that this is a contrived example, but I am running into this error in a very similar one to many relation where multiple GROUP BY
statements are needed and so just using DISTINCT
isn't an option, like it would be for this example.
Best Answer
The problem appears because Postgres knows that table
t
has(id)
as the primary key - so we can useGROUP BY t.id
and have more columns from that table in theSELECT
list without aggregating them - but it can't infer the same for viewvt
.For the examples provided, you can rewrite them as not to use
GROUP BY
at all. Since you don't need any column from the second joined table in the select list, you can convert theJOIN
to anEXISTS
subquery, i.e. convert the join to a semi-join. Now both the tables and views semijoin work without errors: