Postgresql – Removing duplicate rows with two or more left join tables

join;postgresqlpostgresql-9.4

With a query like this

SELECT a.id, a.name,
       COALESCE( json_agg(b.*), '[]'::json ),
       COALESCE( json_agg(c.*), '[]'::json ),
  FROM a
  LEFT JOIN b ON a.id = b.a_id
  LEFT JOIN c ON a.id = c.a_id
 GROUP BY a.id, a.name;

When executed, both c and b will be multiplied by each other and produce duplicated entries in the JSON array object.

I tried changing the query to use 2 subqueries instead, but I get all sorts of errors and warning, such as "Subquery must return only one column", etc.

I also tried using LEFT OUTER JOIN, but I guess I don't yet master how joining tables work as it only applies to b and c is still multiplied and contain duplicates.

Edit: using DISTINCT errors with "could not identify an equality operator for type json" on the COALESCE functions.

How can I fix this query and only aggregate unique rows?


Edit 2

I need to specify that both table b and c are actually VIEWs, and they both have at least one json_agg column as well, so I cannot just use json_agg(DISTINCT b.*). This would've been way too easy.


Edit 3

Here is a small snippet to reproduce the issue :

--DROP TABLE IF EXISTS tbl_a CASCADE;
--DROP TABLE IF EXISTS tbl_b CASCADE;
--DROP TABLE IF EXISTS tbl_c CASCADE;

CREATE TABLE tbl_a (
  id bigserial NOT NULL,
  name character varying(16),
  CONSTRAINT "PK_tbl_a" PRIMARY KEY (id)
) WITH ( OIDS=FALSE );

CREATE TABLE tbl_b (
  a_id bigint NOT NULL,
  foo json NOT NULL DEFAULT '{}'::json,
  CONSTRAINT "FK_tbl_b_a" FOREIGN KEY (a_id)
      REFERENCES tbl_a (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
) WITH ( OIDS=FALSE );

CREATE TABLE tbl_c (
  a_id bigint NOT NULL,
  bar json NOT NULL DEFAULT '{}'::json,
  CONSTRAINT "FK_tbl_c_a" FOREIGN KEY (a_id)
      REFERENCES tbl_a (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
) WITH ( OIDS=FALSE );

INSERT INTO tbl_a (id,name) VALUES (1, 'Test');
INSERT INTO tbl_b (a_id, foo) VALUES (1, '{"foo":"Hello"}'::json), (1, '{"foo":"World"}'::json);
INSERT INTO tbl_c (a_id, bar) VALUES (1, '{"bar":"abc"}'::json), (1, '{"bar":"def"}'::json);

SELECT tbl_a.id, tbl_a.name,
       COALESCE(json_agg(tbl_b.*), '{}'::json),
       COALESCE(json_agg(tbl_c.*), '{}'::json)
  FROM tbl_a
  LEFT JOIN tbl_b ON tbl_a.id = tbl_b.a_id
  LEFT JOIN tbl_c ON tbl_a.id = tbl_c.a_id
 GROUP BY tbl_a.id, tbl_a.name;

Returns

id  name    coalesce                              coalesce
--  ------  ------------------------------------  ----------------------
 1  "Test"  "[{"a_id":1,"foo":{"foo":"World"}},   "[{"a_id":1,"bar":{"bar":"abc"}},
             {"a_id":1,"foo":{"foo":"Hello"}},    {"a_id":1,"bar":{"bar":"abc"}}, 
             {"a_id":1,"foo":{"foo":"World"}},    {"a_id":1,"bar":{"bar":"def"}}, 
             {"a_id":1,"foo":{"foo":"Hello"}}]"   {"a_id":1,"bar":{"bar":"def"}}]"

Best Answer

I have found a solution. I am not sure if it is optimal, but it works.

SELECT tbl_a.id, tbl_a.name,
       COALESCE( ( SELECT json_agg(tbl_b.*)
                     FROM tbl_b 
                    WHERE tbl_b.a_id = tbl_a.id ), '{}'::json),
       COALESCE( ( SELECT json_agg(tbl_c.*)
                     FROM tbl_c 
                    WHERE tbl_c.a_id = tbl_a.id ), '{}'::json)
  FROM tbl_a;

Which correctly returns

id  name    coalesce                             coalesce
--  ------  -----------------------------------  ----------------------
 1  "Test"  "[{"a_id":1,"foo":{"foo":"World"}},  "[{"a_id":1,"bar":{"bar":"abc"}},
              {"a_id":1,"foo":{"foo":"Hello"}},    {"a_id":1,"bar":{"bar":"def"}}]"