Postgresql – Enforce Functional Dependency in GROUP BY Clause

group bypostgresql

Let's consider I have the following tables:

    CREATE TABLE users(
        id serial PRIMARY KEY,
        age integer
    )

    CREATE TABLE products(
        id serial PRIMARY KEY,
        sku character varying(255),
        user_id integer REFERENCES users
    )

So basically a product would belong to a user and a user could have many products.

I want the ability to load users with their products doing the aggregation at the database level (Maybe wrongly but I assume it would be easier and probably more efficient than running some aggregation code in my application layer: Postgres comes with nice aggregation functions, why not using them !)

So I would run something like

SELECT "users".*, json_agg("products".*) as "products" 
  FROM "users" LEFT JOIN "products" ON "users"."id" = "products"."user_id" 
  GROUP BY "users"."id"

All good, my driver can parse JSON structures and I have nothing to do on the application layer side.

The problem rises if I want to introduce some sort of pagination:

WITH "users" AS (SELECT * FROM "users" ORDER BY "id" LIMIT 20)
SELECT "users".*, json_agg("products".*) as "products" 
FROM "users" LEFT JOIN "products" ON "users"."id" = "products"."user_id" 
GROUP BY "users"."id"

I have the known error

ERROR: column "users.age" must appear in the GROUP BY clause or be used in an aggregate function

As the "users"."id" is not considered as the primary key of the temporary table created by my subquery

I could fix it by adding every users column in the group by clause. But I find it troublesome and to be a shame as I am sure "users"."id" will define in a unique way an item of my subquery.

So, I would like to know if there is a way to tell the database engine "users"."id" is some sort of primary key for my subquery ?

If not, do you see a better way ?

EDIT: This question is quite similar (3 years old though)

Thanks

Best Answer

create table users (user_id int, user_name text);

insert into users values
(1, 'user1'),(2, 'user2'),(3, 'user3'),(4, 'user4'),(5, 'user5');

create table products (user_id int, id int, name text, att1 text);

insert into products values
(2, 1, 'prod1', 'att1'),
(2, 2, 'prod2', 'att1'),
(2, 3, 'prod3', 'att1'),
(2, 4, 'prod4', 'att1'),
(3, 5, 'prod5', 'att1'),
(3, 6, 'prod6', 'att1');

IMHO your first query should throw the same error message.

select
    users.*,
    json_agg(products.*) as products
from
    users
left join products on
    users.user_id = products.user_id;

ERROR: column "users.user_id" must appear in the GROUP BY clause or be used in an aggregate function LINE 2: users.*,
^

But you can move json_agg to a subquery instead of joining both tables, then you can easily limit the number of returned rows.

select 
    users.*,
    (select 
        json_agg(products.*)
     from
        products
     where
        user_id = users.user_id)  as products
from
    users
limit 3;

+---------+-----------+------------------------------------------------------+
| user_id | user_name | products                                             |
+---------+-----------+------------------------------------------------------+
| 1       | user1     |                                                      |
+---------+-----------+------------------------------------------------------+
| 2       | user2     | [{"user_id":2,"id":1,"name":"prod1","att1":"att1"},  |
|         |           |  {"user_id":2,"id":2,"name":"prod2","att1":"att1"},  |
|         |           |  {"user_id":2,"id":3,"name":"prod3","att1":"att1"},  |
|         |           |  {"user_id":2,"id":4,"name":"prod4","att1":"att1"}]  |
+---------+-----------+------------------------------------------------------+
| 3       | user3     | [{"user_id":3,"id":5,"name":"prod5","att1":"att1"},  |
|         |           |  {"user_id":3,"id":6,"name":"prod6","att1":"att1"}]  |
+---------+-----------+------------------------------------------------------+

db<>fiddle here