Postgresql – INSERT duplicate rows for multiple, cascading 1:M child tables

cteinsertpostgresql

Imagine multiple parent-child tables with 1:M relationships. I would like to "cascade"-insert duplicate rows based on root parent table row selection.
Each table has IDENTITY primary key and each child table has FK to its parent ID (one level up).

The goal

Given root parent table ID, insert duplicate rows for it and all of its children.

I tried "cascading"-insert CTEs, but ran into issues where RETURNING is limited to return only inserted data, while I need extra information to connect the next INSERT.

I was able to accomplish this by adding an extra column (copied_from_id).

My question

Is there a way to accomplish the same without extra column?

I did see this answer by @Erwin Brandstetter but his example only has 1 parent and child and I was not sure how to extend this to multiple levels

Example

Here is example DDL and DML to illustrate the issue

  • lvl_one – top most, root, parent table
  • lvl_two – child table of lvl_one (1:M)
  • lvl_three – child table of lvl_two (1:M)

Setup

--DROP TABLE IF EXISTS lvl_one,lvl_two,lvl_three CASCADE;
CREATE TABLE IF NOT EXISTS public.lvl_one (
    id      bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
    name    text,
    CONSTRAINT lvl_one_pk PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS public.lvl_two (
    id          bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
    lvl_one_id  bigint NOT NULL,    
    name        text,
    CONSTRAINT lvl_two_pk PRIMARY KEY (id),
    CONSTRAINT lvl_two_lvl_one_id_fk FOREIGN KEY (lvl_one_id)
        REFERENCES public.lvl_one (id) 
);
CREATE TABLE IF NOT EXISTS public.lvl_three (
    id          bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
    lvl_two_id  bigint NOT NULL,    
    name        text,
    CONSTRAINT lvl_three_pk PRIMARY KEY (id),
    CONSTRAINT lvl_three_lvl_two_id_fk FOREIGN KEY (lvl_two_id)
        REFERENCES public.lvl_two (id) 
);

Initial data

-- initial data
INSERT INTO lvl_one(name)               VALUES ('Honda'),   ('Ford'),       ('Toyota');
INSERT INTO lvl_two(lvl_one_id, name)   VALUES (1,'Civic'), (1,'Passport'), (3,'Prius');
INSERT INTO lvl_three(lvl_two_id, name) VALUES (1,'door'),  (1,'window'),   (3,'trunk');


SELECT * FROM lvl_one ORDER BY id;
-- id, name
--  1, "Honda"
--  2. "Ford"
--  3, "Toyota"

SELECT * FROM lvl_two ORDER BY id;
-- id, lvl_one_id, name
--  1, 1,   "Civic"
--  2, 1,   "Passport"
--  3, 3,   "Prius"

SELECT * FROM lvl_three ORDER BY id;
-- id, lvl_two_id,  name
--  1, 1,   "door"
--  2, 1,   "window"
--  3, 3,   "trunk"

SELECT 
  one.id AS one_id, one.name AS one_name
, two.id AS two_id, two.name AS two_name
, three.id AS three_id, three.name AS three_name
FROM lvl_one AS one 
LEFT OUTER JOIN lvl_two AS two ON one.id = two.lvl_one_id
LEFT OUTER JOIN lvl_three AS three ON two.id = three.lvl_two_id
ORDER BY one.id, two.id, three.id;
--1 "Honda"     1       "Civic"     1       "door"
--1 "Honda"     1       "Civic"     2       "window"
--1 "Honda"     2       "Passport"  NULL    NULL        
--2 "Ford"      NULL    NULL        NULL    NULL            
--3 "Toyota"    3       "Prius"     3       "trunk"

Solution (with adding extra column)

ALTER TABLE lvl_one   ADD COLUMN copied_from_id bigint;
ALTER TABLE lvl_two   ADD COLUMN copied_from_id bigint;
ALTER TABLE lvl_three ADD COLUMN copied_from_id bigint;


-- copy row id=1 from lvl_one and all its child tables
WITH source_one AS (
    SELECT id,name 
    FROM lvl_one 
    WHERE id=1
)
, copy_one AS (
    INSERT INTO lvl_one(name,copied_from_id)
    SELECT name,id AS copied_from_id
    FROM source_one
    RETURNING id AS new_one_id, copied_from_id
)
, copy_two AS (
    INSERT INTO lvl_two(lvl_one_id,name,copied_from_id)
    SELECT new_one_id, lvl_two.name,lvl_two.id AS copied_from_id
    FROM copy_one 
    INNER JOIN lvl_one ON lvl_one.id = copy_one.copied_from_id
    INNER JOIN lvl_two ON lvl_two.lvl_one_id = lvl_one.id
    RETURNING id AS new_two_id, copied_from_id
)
, copy_three AS (
    INSERT INTO lvl_three(lvl_two_id,name,copied_from_id)
    SELECT new_two_id, lvl_three.name, lvl_three.id AS copied_from_id
    FROM copy_two
    INNER JOIN lvl_two ON lvl_two.id = copy_two.copied_from_id
    INNER JOIN  lvl_three ON lvl_three.lvl_two_id = lvl_two.id
    RETURNING id AS new_three_id, copied_from_id
)
SELECT * FROM copy_one, copy_two, copy_three;

Expected results of duplicating lvl_one.id=1

Following rows will be created in all 3 tables as a result of "duplicating" lvl_one.id=1 row.

-- lvl_one
-- 4,Honda,1

-- lvl_two
--4,4,Civic,1
--5,4,Passport,2

-- lvl_three
--4,4,door,1
--5,4,window,2

Best Answer

Unfortunately, the RETURNING clause of an INSERT can only work with columns from the inserted row. Columns added by a FROM clause are invisible there.

See:

To work around this limitation, I suggest to SELECT before each INSERT, and generate prospective new serial IDs in advance with nextval(). Then you have each old and new ID in the same row to make necessary connections.

An minor additional complication for this approach is that you have IDENTITY columns with GENERATE ALWAYS. So we need OVERRIDING SYSTEM VALUE for INSERT to write to those columns anyway. (Or you create your IDENTITY columns with GENERATED BY DEFAULT):

WITH ins1 AS (
   INSERT INTO lvl_one(name)
   SELECT name
   FROM   lvl_one
   WHERE  id = 1  --  $1 here
   RETURNING id AS new_parent_id, name   -- just the one
   )
, sel2 AS (
   SELECT ins1.new_parent_id, t2.id, t2.name, nextval(pg_get_serial_sequence('lvl_two', 'id')) AS new_id
   FROM   ins1
   JOIN   lvl_two t2 ON t2.lvl_one_id = 1   --  and $1 here
   )
, ins2 AS (
   INSERT INTO lvl_two(id, lvl_one_id, name) OVERRIDING SYSTEM VALUE 
   SELECT new_id, new_parent_id, name
   FROM   sel2
   )
, sel3 AS (
   SELECT sel2.new_id AS new_parent_id, t3.id, t3.name, nextval(pg_get_serial_sequence('lvl_three', 'id')) AS new_id
   FROM   sel2
   JOIN   lvl_three t3 ON t3.lvl_two_id = sel2.id  -- old parent ID
   )
, ins3 AS (
   INSERT INTO lvl_three(id, lvl_two_id, name) OVERRIDING SYSTEM VALUE 
   SELECT new_id, new_parent_id, name
   FROM   sel3
   )
SELECT ins1.new_parent_id AS lvl1_id, ins1.name AS lvl1_name
     , sel2.new_id AS lvl2_id, sel2.name AS lvl2_name
     , sel3.new_id AS lvl3_id, sel3.name AS lvl3_name
FROM   ins1 
LEFT   JOIN sel2 USING (new_parent_id)
LEFT   JOIN sel3 ON sel3.new_parent_id = sel2.new_id
ORDER  BY lvl1_id, lvl2_id, lvl3_id;

db<>fiddle here

Performance should be pretty similar. The main benefit is that we need no additional table columns as requested.

The first INSERT is simple because it can only affect a single row by definition, so I didn't add another SELECT there. The following steps follow the same pattern and can descend as many levels as you need.

Also note that the outer SELECT in your original solution would produce incorrect results (not affecting actually inserted rows):

...
SELECT * FROM copy_one, copy_two, copy_three;

CROSS JOIN between the tables would combine rows that shouldn't be combined and exclude rows that shouldn't be excluded.