PostgreSQL – GROUP BY id Works on Table but Not on Identical View

postgresqlview

Lets say I have two tables. One of them references the other in a one-to-many relationship:

CREATE TABLE t (
  id SERIAL PRIMARY KEY
, name text
);

INSERT INTO t (name) VALUES ('one'), ('two'), ('three');

CREATE TABLE y (
  id SERIAL PRIMARY KEY
, tid INTEGER REFERENCES t
, amount INTEGER
);

INSERT INTO y (tid, amount) VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(3, 1);

I can query data from these tables in order to get t rows matching a certain condition set on t and y:

SELECT row_to_json(t.*) as "t_nogroup" FROM t
JOIN y ON t.id = y.tid
WHERE t.id = 1
AND y.amount > 1;

This yields

t_nogroup
=========
{"id":1,"name":"one"}
{"id":1,"name":"one"}

Which works, yes, but contains duplicates. I can filter out the duplicates by using a GROUP BY:

SELECT row_to_json(t.*) as "t_group" FROM t
JOIN y ON t.id = y.tid
WHERE t.id = 1
AND y.amount > 1
GROUP BY t.id;

Making:

t_group
=======
{"id":1,"name":"one"}

Great! However, when I try to exchange t and y for non-materialized VIEWs:

CREATE VIEW vt AS SELECT t.* FROM t;
CREATE VIEW vy AS SELECT y.* FROM y;

And run the same queries with the new VIEWs

SELECT row_to_json(vt.*) as "view_t_nogroup" FROM vt
JOIN vy ON vt.id = vy.tid
WHERE vt.id = 1
AND vy.amount > 1;

SELECT row_to_json(vt.*) as "view_t_group" FROM vt
JOIN vy ON vt.id = vy.tid
WHERE vt.id = 1
AND vy.amount > 1
GROUP BY vt.id;

I then get the error for the second query:

ERROR: column "vt.*" must appear in the GROUP BY clause or be used in an aggregate function

Why does this happen? Whatever logic that saw that * contained id when using the table t should still work for the view vt, right?

Fiddle link: http://sqlfiddle.com/#!17/26b0e/5

I'm running PostgreSQL 9.6. I'm also aware that this is a contrived example, but I am running into this error in a very similar one to many relation where multiple GROUP BY statements are needed and so just using DISTINCT isn't an option, like it would be for this example.

Best Answer

The problem appears because Postgres knows that table t has (id) as the primary key - so we can use GROUP BY t.id and have more columns from that table in the SELECT list without aggregating them - but it can't infer the same for view vt.

For the examples provided, you can rewrite them as not to use GROUP BY at all. Since you don't need any column from the second joined table in the select list, you can convert the JOIN to an EXISTS subquery, i.e. convert the join to a semi-join. Now both the tables and views semijoin work without errors:

SELECT row_to_json(t.*) as "t_group" FROM t
WHERE t.id = 1 
  AND EXISTS 
      ( SELECT 1 
        FROM y 
        WHERE t.id = y.tid
          AND y.amount > 1 
      ) ;

SELECT row_to_json(vt.*) as "t_group" FROM vt
WHERE vt.id = 1 
  AND EXISTS 
      ( SELECT 1 
        FROM vy 
        WHERE vt.id = vy.tid
          AND vy.amount > 1 
      ) ;