Postgresql – CTE Insert then select with join

postgresql

I'm trying to figure out why this query doesn't work, the goal is to insert an item, then return its ID and use that ID to get related data from another table, however it returns an empty record set

WITH inserted as (
    INSERT INTO food_update(diary_id, food_id, value)
    VALUES($1, $3, $4) RETURNING id ) 
SELECT * FROM food_update INNER JOIN food ON food.id = food_update.food_id where food_update.id=(select id from inserted)

However,

WITH inserted as (
        INSERT INTO food_update(diary_id, food_id, value)
        VALUES($1, $3, $4) RETURNING id ) 
(select id from inserted)

If I change it to the query above, it returns the newly inserted ID, then when I run SELECT * FROM food_update INNER JOIN food ON food.id = food_update.food_id where food_update.id=(select id from inserted) (second part of the first query) with the newly inserted ID in a separate query it works, but they don't seem to work together. Am i not seeing something obvious, or is there a better way to do this?

I can provide the schema however it's a very general question really i'd just like to know why the insert doesn't seem to return the newly created sequence value. One other way of doing it was to use CURRVAL, though I'm not exactly sure how concurrency works with postgres so i wasn't sure if two queries ran at the same time it may get the second ones ID, though i'm pretty sure the tables meant to be locked while the query is being executed?

As requested here's a schema and a minimal example so that my issue is isolated + anyone that wants to test it can have a look. This implies there's an active database with a public schema

Schema creation + records

CREATE TABLE public.food (
    created timestamptz NULL,
    modified timestamptz NULL,
    id bigserial NOT NULL,
    "name" text NULL,
    CONSTRAINT food_pkey PRIMARY KEY (id)
);


CREATE TABLE public.food_update (
    id bigserial NOT NULL,
    created timestamptz NULL,
    modified timestamptz NULL,
    food_id int8 NULL,
    value numeric(10,2) NULL,
    CONSTRAINT food_update_pkey PRIMARY KEY (id),
    CONSTRAINT food_update_food_id_fkey FOREIGN KEY (food_id) REFERENCES food(id)
);



INSERT INTO public.food
(created, modified, id, "name")
VALUES(NULL, NULL, 1, 'TEST');

This is the query (slightly modified version of mine), and it shows that it still returns nothing.

WITH inserted as (
    INSERT INTO food_update(food_id, value)
    values($1, $2) RETURNING id ) 
SELECT * FROM food_update INNER JOIN food ON food.id = food_update.food_id where food_update.id=(select id from inserted)

Best Answer

I'm not sure I understand your scenario, but the id returned from the CTE is unlikely to match any other table, since it has just been invented. Would something along:

WITH inserted as (
    INSERT INTO food_update(food_id, value)
    values(1, 12) RETURNING id, food_id 
) 
SELECT * 
FROM inserted 
INNER JOIN food 
    ON food.id = inserted.food_id

do?

Clarify `ON CONFLICT DO UPDATE` behavior

Consider the manual here:

For each individual row proposed for insertion, either the insertion proceeds, or, if an arbiter constraint or index specified by conflict_target is violated, the alternative conflict_action is taken.

Bold emphasis mine. So you do not have to repeat predicates for columns included in the unique index in the WHERE clause to the UPDATE (the conflict_action):

INSERT INTO test_upsert AS tu
       (name   , status, test_field  , identifier, count) 
VALUES ('shaun', 1     , 'test value', 'ident'   , 1)
ON CONFLICT (name, status, test_field) DO UPDATE
SET count = tu.count + 1;
WHERE tu.name = 'shaun' AND tu.status = 1 AND tu.test_field = 'test value'

The unique violation already establishes what your added WHERE clause would enforce redundantly.

Clarify partial index

Add a WHERE clause to make it an actual partial index like you mentioned yourself (but with inverted logic):

CREATE UNIQUE INDEX test_upsert_partial_idx
ON public.test_upsert (name, status)
WHERE test_field IS NULL;  -- not: "is not null"

To use this partial index in your UPSERT you need a matching conflict_target like @ypercube demonstrates:

ON CONFLICT (name, status) WHERE test_field IS NULL

Now the above partial index is inferred. However, as the manual also notes:

[...] a non-partial unique index (a unique index without a predicate) will be inferred (and thus used by ON CONFLICT) if such an index satisfying every other criteria is available.

If you have an additional (or only) index on just (name, status) it will (also) be used. An index on (name, status, test_field) would explicitly not be inferred. This doesn't explain your problem, but may have added to the confusion while testing.

Solution

AIUI, none of the above solves your problem, yet. With the partial index, only special cases with matching NULL values would be caught. And other duplicate rows would either be inserted if you have no other matching unique indexes / constraints, or raise an exception if you do. I suppose that's not what you want. You write:

The composite key is made up of 20 columns, 10 of which can be nullable.

What exactly do you consider a duplicate? Postgres (according to the SQL standard) does not consider two NULL values to be equal. The manual:

In general, a unique constraint is violated if there is more than one row in the table where the values of all of the columns included in the constraint are equal. However, two null values are never considered equal in this comparison. That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained columns. This behavior conforms to the SQL standard, but we have heard that other SQL databases might not follow this rule. So be careful when developing applications that are intended to be portable.

Allow null in unique column

I assume you want NULL values in all 10 nullable columns to be considered equal. It is elegant & practical to cover a single nullable column with an additional partial index like demonstrated here:

PostgreSQL multi-column unique constraint and NULL values

But this gets out of hand quickly for more nullable columns. You'd need a partial index for every distinct combination of nullable columns. For just 2 of those that's 3 partial indexes for (a), (b) and (a,b). The number is growing exponentially with 2^n - 1. For your 10 nullable columns, to cover all possible combinations of NULL values, you'd already need 1023 partial indexes. No go.

The simple solution: replace NULL values and define involved columns NOT NULL, and everything would work just fine with a simple UNIQUE constraint.

If that's not an option I suggest an expression index with COALESCE to replace NULL in the index:

CREATE UNIQUE INDEX test_upsert_solution_idx
    ON test_upsert (name, status, COALESCE(test_field, ''));

The empty string ('') is an obvious candidate for character types, but you can use any legal value that either never appears or can be folded with NULL according to your definition of "unique".

Then use this statement:

INSERT INTO test_upsert as tu(name,status,test_field,identifier, count) 
VALUES ('shaun', 1, null        , 'ident', 11)  -- works with
     , ('bob'  , 2, 'test value', 'ident', 22)  -- and without NULL
ON     CONFLICT (name, status, COALESCE(test_field, '')) DO UPDATE  -- match expr. index
SET    count = COALESCE(tu.count + EXCLUDED.count, EXCLUDED.count, tu.count);

Like @ypercube I assume you actually want to add count to the existing count. Since the column can be NULL, adding NULL would set the column NULL. If you define count NOT NULL, you can simplify.

Another idea would be to just drop the conflict_target from the statement to cover all unique violations. Then you could define various unique indexes for a more sophisticated definition of what's supposed to be "unique". But that won't fly with ON CONFLICT DO UPDATE. The manual once more:

For ON CONFLICT DO NOTHING, it is optional to specify a conflict_target; when omitted, conflicts with all usable constraints (and unique indexes) are handled. For ON CONFLICT DO UPDATE, a conflict_target must be provided.

Postgresql – Conditional INSERT with a nested CTE

For the purpose of this question, I'll assume employee_details.name to be defined UNIQUE. Else, the whole operation wouldn't make sense.

You cannot nest a data-modifying CTE like you tried (as you already found out the hard way) - and you don't need to. This query would achieve your objective:

WITH e AS (
   SELECT name, employee_id
   FROM   employee_details
   WHERE  name = 'jack bauer'
   )
 , i1 AS (
   INSERT INTO employee             -- no target columns!
   SELECT                           -- empty SELECT list!
   WHERE NOT EXISTS (SELECT FROM e)
   RETURNING id
   )
 , i2 AS (
   INSERT INTO employee_details (name, employee_id) 
   SELECT 'jack bauer', id
   FROM   i1
   RETURNING name, employee_id
   )
SELECT employee_id, name FROM e
UNION ALL 
SELECT employee_id, name FROM i2;

The core feature is the INSERT with no target columns and an empty SELECT. Postgres fills all columns not listed in the SELECT with default values. This way we can replace the unconditional VALUES (default) with a conditional INSERT. The CTE i1 only inserts a row if the given name was not found.

The manual:

If no list of [target] column names is given at all, the default is all the columns of the table in their declared order; [...]

Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is none.

This is a Postgres specific extension of the standard:

Also, the case in which a column name list is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard.

The final CTE i2 only inserts a row if i1 returned a row. Voilá.

This is subject to race conditions under concurrent write load to the same tables. If you need to rule that out, you need to do more. Related:

How to use RETURNING with ON CONFLICT in PostgreSQL?

Without the complications from the conditional INSERT in the 2nd table, this would boil down to a common case of SELECT or INSERT:

Is SELECT or INSERT in a function prone to race conditions?

Aside

"id" text DEFAULT gen_random_uuid()

I'd strongly advise to use the data type uuid to store UUIDs.

Best Answer

Related Solutions

PostgreSQL UPSERT issue with NULL values

Clarify ON CONFLICT DO UPDATE behavior

Clarify partial index

Solution

Postgresql – Conditional INSERT with a nested CTE

Aside

Related Question

Clarify `ON CONFLICT DO UPDATE` behavior