PostgreSQL – Handling Race Conditions with Unique Constraint on Insert

concurrencyplpgsqlpostgresqltransactionupsert

I have a web service (http api) which allows a user to restfully create a resource. After authentication and validation I pass off the data to a Postgres function and allow it to check authorisation and create the records in the database.

I found a bug today when two http requests had been made within the same second which caused this function to be called with identical data twice. There is a clause inside the function which makes a select on a table to see if a value exists, if it does exist then I take the ID and use that on my next operation, if it doesn't then I insert the data, get back the ID and then use that on the next operation. Below is a simple example.

select id into articleId from articles where title = 'my new blog';
if articleId is null then
    insert into articles (title, content) values (_title, _content)
    returning id into articleId;
end if;
-- Continue, using articleId to represent the article for next operations...

As you can probably guess, I got a phantom read on the data where both transactions entered the if articleId is null then block and tried to insert onto the table. One succeeded and the other blew up because of a unique constraint on a field.

I've had a look around at how to defend against this and found a few different options but none seem like they fit our needs for a few reasons and I'm struggling to find any alternatives.

  1. insert ... on conflict do nothing/update... I first looked at the on conflict option which looked good however the only option is to do nothing which then doesn't return the ID of the record that caused the collision, and do update won't work as it will cause triggers to be fired off when in reality the data hasn't changed. In some instances this isn't a problem but in many cases this might invalidate sessions user sessions which isn't something we can do.
  2. set transaction isolation level serializable; this seems like the most attractive answer, however even our test suite can cause read/write dependencies where, like the above, we want to insert if something doesn't exist and return it if it does and carry on with further operations. If we have several transactions pending that run the above code it will cause a read/write dependency error as outlined in the transaction-iso of the Postgres docs.

How should is this sort of concurrent read/write transaction handled?

Neither myself or my team claim to be database experts, let alone Postgres experts but feel like this must be a solved problem, or one people have come across in the past. We're open to any suggestions. If the information provided above is not enough, please comment and I'll add more information as needed.

Best Answer

Try the insert first, with on conflict ... do nothing and returning id. If the value already exists, you will get no result from this statement, so you have then to execute a select to get the ID.

If two transactions try to do this at the same time, one of them will block on the insert (because the database does not yet know if the other transaction will commit or rollback), and continue only after the other transaction has finished.