Postgresql – How to implement insert-if-not-found for transactions at serializable isolation level

postgresqlserializationtransaction

I'm having a hard time figuring out how to exactly implement a 'insert if not found' function. Consider the following.

We have a table called artist with 2 columns, (name, id) where name is the unique and id is a serial primary key. It's a contrived example, but it illustrates my problem:

    SESSION A                     SESSION B
1.                                SELECT id FROM artist
                                    WHERE name = 'Bob';
2.  INSERT INTO artist (name)
      VALUES ('Bob')
3.                                INSERT INTO artist (name)
                                    VALUES ('Bob')
4.   code that users 'Bob'
     (e.g., a FK to Bob's ID)
5.                                ??? Bob already exists, but we
                                  can't find it
4.  COMMIT

Session B begins by trying to find an artist called Bob, which fails. However, Session A then creates Bob. Session B tries to insert an artist called Bob, which fails as it violates the primary key. But here's the bit I don't get — if I change operation 3 to be a select on artist the table is still empty! This is because I'm using the serializable isolation level, but how can I handle this case?

It seems the only option I have is to abort the entire transaction and try again. If this is the case, should I throw my own 'could not serialize' exception, indicating the application should retry? I already wanted this 'find-or-insert' in a plpgsql function, where I would INSERT, and if that failed SELECT but it seems impossible to find the conflicting row…

Best Answer

This is a bit of a FAQ. You'd find more information if you searched for ON DUPLICATE KEY UPDATE (the MySQL syntax), MERGE (the SQL-standard syntax), or UPSERT. It's surprisingly hard.

The best article I've seen on it yet is Depesz's "why is upsert so complicated". There's also the SO question Insert, on duplicate update (postgresql) which has suggestions but lacks explanation and discussion of the issues.

The short answer is that, yes:

It seems the only option I have is to abort the entire transaction and try again.

When using SERIALIZABLE transactions you just have to re-issue them when they fail. Which they will. By design - and much more frequently on Pg 9.1 and above because of greatly improved conflict detection. Upsert-like operations are very high conflict, so you may land up retrying quite a bit. If you can do your upserts in READ COMMITTED transactions instead it'll help, but you should still be prepared to retry because there are some unavoidable race conditions.

Let the transaction fail with a unique violation when you insert the conflicting row. If you get a SQLSTATE 23505 unique_violation failure from the transaction and you know you were attempting an upsert, re-try it. If you get a SQLSTATE 40001 serialization_failure you should also retry.

You fundamentally cannot do that retry within a PL/PgSQL function (without dirty hacks like dblink), it must be application side. If PostgreSQL had stored procedures with autonomous transactions then it'd be possible, but it doesn't. In READ COMMITTED mode you can check for conflicting inserts made since the transaction started, but not after the statement that calls the PL/PgSQL function started, so even in READ COMMITTED your "detect conflict with select" approach simply will not work.

Read depesz's article for a much better and more detailed explanation.

Related Solutions

Postgresql – Postgres : Executing SELECT within a transaction does not return most recent rows

I think your problem is simply having a BEFORE trigger - it fires before the row is inserted which cannot appear in the view yet. Change it to AFTER and (after considering Craig's suggestion) you are done.

Postgresql – Recursive query to find shortest path in graph

A recursive solution where the connections alternate between names and company_names.

The output shows all paths between the starting and ending node:

WITH RECURSIVE
conn AS
( SELECT
      name, company_name, job,
      (ROW_NUMBER() OVER (ORDER BY company_name, job))::text
          AS rn,
      CONCAT_WS(', ', name, company_name, job)::text
          AS node,
      1 AS lvl
  FROM
      company
  WHERE
      name = 'Bob Ross'
  UNION ALL
  SELECT
      b.name, b.company_name, b.job,
      CONCAT(a.rn, '-', (ROW_NUMBER()
          OVER (PARTITION BY a.rn
                ORDER BY b.name, b.company_name, b.job))::text),
      CONCAT(a.node, ' - ',
             CONCAT_WS(', ', b.name, b.company_name, b.job)),
      lvl + 1
  FROM
      conn AS a
      JOIN company AS b
      ON (  (  a.company_name = b.company_name
           AND a.name <> b.name AND a.lvl % 2 = 1
            )
         OR (  a.company_name <> b.company_name
           AND a.name = b.name AND a.lvl % 2 = 0
            )
         )
      AND a.name <> 'Celine Dion'
      AND b.name <> 'Bob Ross'
)
SELECT r.*
FROM conn AS f
     JOIN conn AS r 
     ON f.rn LIKE CONCAT(r.rn, '%')
WHERE f.name = 'Celine Dion' ;

Test at dbfiddle.uk

Best Answer

Related Solutions

Postgresql – Postgres : Executing SELECT within a transaction does not return most recent rows

Postgresql – Recursive query to find shortest path in graph

Related Question