Postgresql – Conditional INSERT with a nested CTE

cteinsertpostgresqlpostgresql-10

I am trying to figure out if there is a way I can make nested CTEs work for this particular case.

Consider the following (highly contrived) scenario which is based on the actual application: there's a single column table of employee id's. Then there's an employee property table with all the details. (The main reason behind the single col table is often as a matter of course new employee id's need to be created in lots and assigned before any details of the actual staff is known.)

Now to the task at hand, we are inserting details (i.e. the name) for a new employee, but first we need to check if an employee with that name already exists. If it does, we'll simply return the id, and if not, we'll create a new employee record and then insert the details, finally returning the newly created id.

To recreate this test scenario:

CREATE TABLE public.employee (
    id text DEFAULT gen_random_uuid(),
    PRIMARY KEY (id)
);

CREATE TABLE public.employee_details (
    employee_id text,
    name text,
    PRIMARY KEY (employee_id),
    FOREIGN KEY (employee_id) REFERENCES public.employee(id)
);

The query I am trying to hammer into shape looks like shown below.

with 
e as 
    (select name, employee_id from employee_details where name = 'jack bauer'), 

i as (insert into employee_details (name, employee_id) 
    select 'jack bauer', 
        (with a as (insert into employee values(default) RETURNING id) select a.id from a)
    where not exists (select 1 from e) returning name, employee_id) 

select employee_id, name from e
union all 
select employee_id, name from i; 

If I replace the nested CTE with an already created id (executing the nested CTE separately), it works (but can result in the creation of a superfluous id). It is also possible to simply move the nested CTE to the top level (so the whole thing will look like with e as (..), i as (..), a as (..) select .. where not exists..., but this would also mean that a superfluous employee id is created where no details need to be inserted. I want to figure out a way to do it "inline" – so the new id will only be created if the not exists clause returns true.

I keep getting the error:

WITH clause containing a data-modifying statement must be at the top level.

I guess the trouble is that the nested CTE returns a "column" whereas the whole query will work if it gets a "value" instead (which it does, when one simply copies in a text value instead of the CTE). I did come across a somewhat related discussion at this question mentioning an apparent bug which has been fixed since 9.3. I don't know if that is related to my troubles here. To quote from the linked discussion:

The parse analysis code seems to think that WITH can only be attached to the top level or a leaf-level SELECT within a set operation tree; but the grammar follows the SQL standard which says no such thing

I'm using Postgres 10.3.

Best Answer

For the purpose of this question, I'll assume employee_details.name to be defined UNIQUE. Else, the whole operation wouldn't make sense.

You cannot nest a data-modifying CTE like you tried (as you already found out the hard way) - and you don't need to. This query would achieve your objective:

WITH e AS (
   SELECT name, employee_id
   FROM   employee_details
   WHERE  name = 'jack bauer'
   )
 , i1 AS (
   INSERT INTO employee             -- no target columns!
   SELECT                           -- empty SELECT list!
   WHERE NOT EXISTS (SELECT FROM e)
   RETURNING id
   )
 , i2 AS (
   INSERT INTO employee_details (name, employee_id) 
   SELECT 'jack bauer', id
   FROM   i1
   RETURNING name, employee_id
   )
SELECT employee_id, name FROM e
UNION ALL 
SELECT employee_id, name FROM i2;

The core feature is the INSERT with no target columns and an empty SELECT. Postgres fills all columns not listed in the SELECT with default values. This way we can replace the unconditional VALUES (default) with a conditional INSERT. The CTE i1 only inserts a row if the given name was not found.

The manual:

If no list of [target] column names is given at all, the default is all the columns of the table in their declared order; [...]

Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is none.

This is a Postgres specific extension of the standard:

Also, the case in which a column name list is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard.

The final CTE i2 only inserts a row if i1 returned a row. Voilá.

This is subject to race conditions under concurrent write load to the same tables. If you need to rule that out, you need to do more. Related:

Without the complications from the conditional INSERT in the 2nd table, this would boil down to a common case of SELECT or INSERT:

Aside

"id" text DEFAULT gen_random_uuid()

I'd strongly advise to use the data type uuid to store UUIDs.