PostgreSQL – Handling Race Conditions with Unique Constraint on Insert

concurrencyplpgsqlpostgresqltransactionupsert

I have a web service (http api) which allows a user to restfully create a resource. After authentication and validation I pass off the data to a Postgres function and allow it to check authorisation and create the records in the database.

I found a bug today when two http requests had been made within the same second which caused this function to be called with identical data twice. There is a clause inside the function which makes a select on a table to see if a value exists, if it does exist then I take the ID and use that on my next operation, if it doesn't then I insert the data, get back the ID and then use that on the next operation. Below is a simple example.

select id into articleId from articles where title = 'my new blog';
if articleId is null then
    insert into articles (title, content) values (_title, _content)
    returning id into articleId;
end if;
-- Continue, using articleId to represent the article for next operations...

As you can probably guess, I got a phantom read on the data where both transactions entered the if articleId is null then block and tried to insert onto the table. One succeeded and the other blew up because of a unique constraint on a field.

I've had a look around at how to defend against this and found a few different options but none seem like they fit our needs for a few reasons and I'm struggling to find any alternatives.

insert ... on conflict do nothing/update... I first looked at the on conflict option which looked good however the only option is to do nothing which then doesn't return the ID of the record that caused the collision, and do update won't work as it will cause triggers to be fired off when in reality the data hasn't changed. In some instances this isn't a problem but in many cases this might invalidate sessions user sessions which isn't something we can do.
set transaction isolation level serializable; this seems like the most attractive answer, however even our test suite can cause read/write dependencies where, like the above, we want to insert if something doesn't exist and return it if it does and carry on with further operations. If we have several transactions pending that run the above code it will cause a read/write dependency error as outlined in the transaction-iso of the Postgres docs.

How should is this sort of concurrent read/write transaction handled?

Neither myself or my team claim to be database experts, let alone Postgres experts but feel like this must be a solved problem, or one people have come across in the past. We're open to any suggestions. If the information provided above is not enough, please comment and I'll add more information as needed.

Best Answer

Try the insert first, with on conflict ... do nothing and returning id. If the value already exists, you will get no result from this statement, so you have then to execute a select to get the ID.

If two transactions try to do this at the same time, one of them will block on the insert (because the database does not yet know if the other transaction will commit or rollback), and continue only after the other transaction has finished.

Related Solutions

Postgresql – Concurrency with Select-conditional Insert/Update/Delete – PostgreSQL

Regarding the SERIALIZABLE transactions, be sure to pay attention to the "for optimal performance" tips near the bottom of this section: http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html#XACT-SERIALIZABLE These can reduce your rollback rates quite significantly.

That said, and not to discourage use of the SERIALIZABLE transaction isolation level, what you describe sounds like exactly the sort of problem that EXCLUSION CONSTRAINTs were designed to solve. Where there is a declarative constraint to enforce a business rule, that is almost always better than using imperative coding (in triggers or application code).

http://www.postgresql.org/docs/9.1/static/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION

http://www.postgresql.org/docs/9.1/static/sql-createtable.html#SQL-CREATETABLE-EXCLUDE

To use this feature with dates you would currently need to use the TEMPORAL module:

http://pgfoundry.org/projects/temporal/

You will probably want to migrate TEMPORAL columns to a RANGE when you move to PostgreSQL version 9.2 (expected to be released this summer). The 9.2 RANGE feature, the TEMPORAL data type, and the EXCLUSION CONSTRAINT feature were all the work Jeff Davis to address problems like that posed in the question. They were designed to work together.

Full disclosure: The SERIALIZABLE transaction behavior in 9.1 was developed by myself and Dan R.K. Ports of MIT.

PostgreSQL RETURN NEXT Function – How to Use

If the results are not meant to be used in a subquery but by code, you may use a REFCURSOR in a transaction.

Example:

CREATE FUNCTION example_cursor() RETURNS refcursor AS $$
DECLARE
  c refcursor;
BEGIN
  c:='mycursorname';
  OPEN c FOR select * from generate_series(1,100000);
  return c;                                       
end;
$$ language plpgsql;

Usage for the caller:

BEGIN;
SELECT example_cursor();
 [output: mycursor]
FETCH 10 FROM mycursor;

 Output:

 generate_series 
-----------------
               1
               2
               3
               4
               5
               6
               7
               8
               9
              10

CLOSE mycursor;
END;

When not interested in piecemeal retrieval, FETCH ALL FROM cursorname may also be used to stream all results to the caller in one step.

Best Answer

Related Solutions

Postgresql – Concurrency with Select-conditional Insert/Update/Delete – PostgreSQL

PostgreSQL RETURN NEXT Function – How to Use

Related Question