PostgreSQL 9.2 – Fetch Nested Results as JSON

jsonpostgresql

Consider the following tables

Device
--------
id
name
type
--------

components
--------
id INT
type VARCHAR
--------

Manufacturers
-------------
id INT
name VARCHAR
country VARCHAR
-------------

Device_components
-----------------
deviceid REFERENCES Devices(id)
componentid REFERENCES Components(id)
-----------------

Component_Manufacturers
-----------------------
componentid REFERENCES Components(id)
manufacturerid REFERENCES Manufacturers(id)
-----------------------

I want to query the database to return something like this:

{
 "id": 1,
  "name": "phone",
  "components": [
    {
      "id": 1,
      "type": "screen",
      "manufacturers": [
       {
         "id": 1,
          "name": "a",
          "country": "Germany"
        }
      ]
   },
   {
     "id": 2,
      "type": "keyboard",
      "manufacturers": [
        {
         "id": 1,
          "name": "a",
          "country": "UK"
        }
     ]
   }
  ]
}

So far I'm selecting from each table individually and then assemble the JSON object in my application.

Here's some sample data:

CREATE TABLE IF NOT EXISTS Devices
(
    ID SERIAL PRIMARY KEY NOT NULL,
    NAME VARCHAR(30) UNIQUE NOT NULL
);

CREATE TABLE IF NOT EXISTS Components
(
    ID SERIAL PRIMARY KEY NOT NULL,
    NAME VARCHAR(30) UNIQUE NOT NULL
);

CREATE TABLE IF NOT EXISTS Manufacturers
(
    ID SERIAL PRIMARY KEY NOT NULL,
    NAME VARCHAR(30) UNIQUE NOT NULL,
    COUNTRY VARCHAR(40)
);  

CREATE TABLE IF NOT EXISTS Device_components
(
DeviceID INT REFERENCES Devices(ID),
ComponentID INT REFERENCES Components(ID)
);

CREATE TABLE IF NOT EXISTS Component_manufacturers
(
ComponentID INT REFERENCES Components(ID),
ManufacturerID INT REFERENCES Manufacturers(ID)
);

INSERT INTO Devices (NAME) VALUES ('phone');
INSERT INTO Devices (NAME) VALUES ('tablet');
INSERT INTO Devices (NAME) VALUES ('pc');

INSERT INTO Components (NAME) VALUES ('mouse');
INSERT INTO Components (NAME) VALUES ('camera');
INSERT INTO Components (NAME) VALUES ('screen');

INSERT INTO Manufacturers (NAME,Country) VALUES ('foo','france');
INSERT INTO Manufacturers (NAME,Country) VALUES ('bar','spain');
INSERT INTO Manufacturers (NAME,Country) VALUES ('baz','germany');

INSERT INTO Device_components VALUES (1,2);
INSERT INTO Device_components VALUES (1,3);
INSERT INTO Device_components VALUES (2,2);
INSERT INTO Device_components VALUES (2,3);
INSERT INTO Device_components VALUES (3,1);
INSERT INTO Device_components VALUES (3,2);
INSERT INTO Device_components VALUES (3,3);

INSERT INTO Component_manufacturers VALUES (1,1);
INSERT INTO Component_manufacturers VALUES (1,2);
INSERT INTO Component_manufacturers VALUES (1,3);
INSERT INTO Component_manufacturers VALUES (2,2);
INSERT INTO Component_manufacturers VALUES (3,3);

Best Answer

Alright, so you seem to want to create a graph from a join tree. That's a pretty natural thing to want to do.

It's not as easy as it should be in Pg, mainly because of the lack of support for column-aliases in the record pseudo-type.

Here's what I came up with:

SELECT row_to_json(r, true)
FROM (
    SELECT
        d.id,   
        d.name, 
        json_agg(c_row) AS components
    FROM devices d
    INNER JOIN device_components dc ON (dc.deviceid = d.id)
    INNER JOIN (
        SELECT  
            c.id,       
            c.name,     
            json_agg(m) AS manufacturers
        FROM components c
        INNER JOIN component_manufacturers cm ON (cm.componentid = c.id) 
        INNER JOIN manufacturers m ON (cm.manufacturerid = m.id) 
        GROUP BY c.id 
    ) c_row ON (c_row.id = dc.componentid)
    GROUP BY d.id
) r(id, name, component);

The general idea here is, at every level of object nesting where you're producing an array of objects, use json_agg in a group_by. The json_agg function implicitly calls row_to_json to transform a rowtype into json. In the SELECT clause specify aliases for the synthetic columns so the json key names are correct when the row is fed into the outer level.

As the outer level isn't aggregated, use row_to_json over a subquery instead of using json_agg. If you want a single json result instead of a set of json rows, you can change row_to_json to json_agg in the outer level.

I've only tested on 9.3, as that's what I have installed and sqlfiddle seems to be having some issues.

Update: Unfortunately json_agg doesn't exist on 9.2, it was added in 9.3. However, this is a simple enough case that you can just use array_agg instead, so this should work in 9.2:

SELECT row_to_json(r, true)
FROM (
    SELECT
        d.id,   
        d.name, 
        array_agg(c_row) AS components
    FROM devices d
    INNER JOIN device_components dc ON (dc.deviceid = d.id)
    INNER JOIN (
        SELECT  
            c.id,       
            c.name,     
            array_agg(m) AS manufacturers
        FROM components c
        INNER JOIN component_manufacturers cm ON (cm.componentid = c.id) 
        INNER JOIN manufacturers m ON (cm.manufacturerid = m.id) 
        GROUP BY c.id 
    ) c_row ON (c_row.id = dc.componentid)
    GROUP BY d.id
) r(id, name, component);

Ideally PostgreSQL would support this sort of query more naturally, without the subqueries, but I think we'd need a new join type, or at least some functions to use with lateral queries. It all plans out into a civilised query plan anyway (though it'd be better with suitable indexes):

                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on r  (cost=1050.26..1282.94 rows=720 width=106)
   ->  GroupAggregate  (cost=1050.26..1273.94 rows=720 width=188)
         ->  Merge Join  (cost=1050.26..1226.42 rows=7704 width=188)
               Merge Cond: (d.id = dc.deviceid)
               ->  Index Scan using devices_pkey on devices d  (cost=0.15..58.95 rows=720 width=82)
               ->  Sort  (cost=1050.11..1069.37 rows=7704 width=110)
                     Sort Key: dc.deviceid
                     ->  Merge Join  (cost=403.16..552.77 rows=7704 width=110)
                           Merge Cond: (c_row.id = dc.componentid)
                           ->  Subquery Scan on c_row  (cost=253.38..285.63 rows=720 width=110)
                                 ->  GroupAggregate  (cost=253.38..278.43 rows=720 width=286)
                                       ->  Sort  (cost=253.38..258.73 rows=2140 width=286)
                                             Sort Key: c.id
                                             ->  Hash Join  (cost=44.75..135.00 rows=2140 width=286)
                                                   Hash Cond: (cm.manufacturerid = m.id)
                                                   ->  Hash Join  (cost=26.20..87.03 rows=2140 width=86)
                                                         Hash Cond: (cm.componentid = c.id)
                                                         ->  Seq Scan on component_manufacturers cm  (cost=0.00..31.40 rows=2140 width=8)
                                                         ->  Hash  (cost=17.20..17.20 rows=720 width=82)
                                                               ->  Seq Scan on components c  (cost=0.00..17.20 rows=720 width=82)
                                                   ->  Hash  (cost=13.80..13.80 rows=380 width=208)
                                                         ->  Seq Scan on manufacturers m  (cost=0.00..13.80 rows=380 width=208)
                           ->  Sort  (cost=149.78..155.13 rows=2140 width=8)
                                 Sort Key: dc.componentid
                                 ->  Seq Scan on device_components dc  (cost=0.00..31.40 rows=2140 width=8)
(25 rows)