PostgreSQL – INSERT Rows in Two Tables Preserving Connection to Third Table

insertpostgresqlsequence

As part of a recent database development project, I'm stuck with a query for which I seem to find no elegant solution (i.e. without looping).

Relevant bits of the schema:

CREATE TABLE sites (
  site_id SERIAL PRIMARY KEY,
  ...
);

CREATE TABLE properties (
  property_id SERIAL PRIMARY KEY,
  property_def_id INT NOT NULL,
  FOREIGN KEY (property_def_id) REFERENCES property_def(property_def_id)
);

CREATE TABLE site_properties (
  property_id INT NOT NULL,
  site_id INT NOT NULL,
  PRIMARY KEY (property_id, site_id),
  FOREIGN KEY (property_id) REFERENCES properties(property_id),
  FOREIGN KEY (site_id) REFERENCES sites(site_id)
);

The task:
Insert one new, distinct property in table properties for each row in table sites and also insert a row in site_properties to store the relation between both.

Desired behaviour:

for each site_id (as my_site_id) in sites:  
* INSERT INTO properties (property_def_id) VALUES (1) RETURNING property_id INTO my_prop_id
* INSERT INTO site_properties (property_id, site_id) VALUES (my_prop_id, my_site_id)

At the end, I would have in the site_properties table each "new" property having being assigned only to one site (i.e. no cross-joins)

I feel it should be possible to do this in one query, but I couldn't figure out how for the love of me. I have tried CTEs but I ended up being stuck with a list of property_id's, a list of sites and no way to join between the two.

What am I missing?

Best Answer

Problem

Expressions in the RETURNING clause of an INSERT can only involve columns from inserted rows, but not columns from additional tables joined in. That makes it hard to associate each newly inserted row in the table properties to one distinct site_id from table sites.

The manual on INSERT:

The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed.

Bold emphasis mine.

Solution

Use nextval() from the family of sequence functions to solve this cheaply and elegantly. Used to be the means of choice before RETURNING was introduced for INSERT with Postgres 8.2 and still works.

Using pg_get_serial_sequence() you also don't need to know the name of the underlying sequence for the serial column:

WITH site_prop AS (
   SELECT site_id, nextval(pg_get_serial_sequence('properties', 'property_id')) AS prop_id
   FROM   sites
   )
, props AS (
   INSERT INTO properties (property_id, property_def_id)
   SELECT prop_id, 1 FROM site_prop
   )
INSERT INTO site_properties (site_id, property_id)
SELECT site_id, prop_id FROM site_prop;

This way you only need a single scan of table sites and neither window functions nor joins. Much cheaper.

Better yet

It would seem like a FK violation to insert rows with new property_id into site_properties while the key does not exist in properties, yet. But that's not the case if you insert the keys into properties in the same command. CTEs are considered part of the same command. The manual:

A constraint that is not deferrable will be checked immediately after every command.

So we can simplify some more, using next_val() and the RETURNING clause:

WITH site_prop AS (
   INSERT INTO site_properties (site_id, property_id)
   SELECT site_id, nextval(pg_get_serial_sequence('properties', 'property_id'))
   FROM   sites
   RETURNING site_id, property_id
   )
INSERT INTO properties (property_id, property_def_id)
SELECT property_id, 1 FROM site_prop;

Related: