Postgresql – Simultaneous calls to the same function: how are deadlocks occurring

deadlockpostgresqlpostgresql-9.4

My function new_customer is called several times per second (but only once per session) by a web application. The very first thing it does is lock the customer table (to do an 'insert if not exists'—a simple variant of an upsert).

My understanding of the docs is that other calls to new_customer should simply queue until all previous calls have finished:

LOCK TABLE obtains a table-level lock, waiting if necessary for any conflicting locks to be released.

Why is it sometimes deadlocking instead?

definition:

create function new_customer(secret bytea) returns integer language sql 
                security definer set search_path = postgres,pg_temp as $$
  lock customer in exclusive mode;
  --
  with w as ( insert into customer(customer_secret,customer_read_secret)
              select secret,decode(md5(encode(secret, 'hex')),'hex') 
              where not exists(select * from customer where customer_secret=secret)
              returning customer_id )
  insert into collection(customer_id) select customer_id from w;
  --
  select customer_id from customer where customer_secret=secret;
$$;

error from log:

_{2015-07-28 08:02:58 BST DETAIL:  Process 12380 waits for ExclusiveLock on relation 16438 of database 12141; blocked by process 12379.
        Process 12379 waits for ExclusiveLock on relation 16438 of database 12141; blocked by process 12380.
        Process 12380: select new_customer(decode($1::text, 'hex'))
        Process 12379: select new_customer(decode($1::text, 'hex'))
2015-07-28 08:02:58 BST HINT:  See server log for query details.
2015-07-28 08:02:58 BST CONTEXT:  SQL function "new_customer" statement 1
2015-07-28 08:02:58 BST STATEMENT:  select new_customer(decode($1::text, 'hex'))}

relation:

postgres=# select relname from pg_class where oid=16438;
┌──────────┐
│ relname  │
├──────────┤
│ customer │
└──────────┘

edit:

I've managed to get a simple-ish reproducible test case. To me this looks like a bug due to some sort of race condition.

schema:

create table test( id serial primary key, val text );

create function f_test(v text) returns integer language sql security definer set search_path = postgres,pg_temp as $$
  lock test in exclusive mode;
  insert into test(val) select v where not exists(select * from test where val=v);
  select id from test where val=v;
$$;

bash script run simultaneously in two bash sessions:

for i in {1..1000}; do psql postgres postgres -c "select f_test('blah')"; done

error log (usually a handful of deadlocks over the 1000 calls):

2015-07-28 16:46:19 BST ERROR:  deadlock detected
2015-07-28 16:46:19 BST DETAIL:  Process 9394 waits for ExclusiveLock on relation 65605 of database 12141; blocked by process 9393.
        Process 9393 waits for ExclusiveLock on relation 65605 of database 12141; blocked by process 9394.
        Process 9394: select f_test('blah')
        Process 9393: select f_test('blah')
2015-07-28 16:46:19 BST HINT:  See server log for query details.
2015-07-28 16:46:19 BST CONTEXT:  SQL function "f_test" statement 1
2015-07-28 16:46:19 BST STATEMENT:  select f_test('blah')

edit 2:

@ypercube suggested a variant with the lock table outside the function:

for i in {1..1000}; do psql postgres postgres -c "begin; lock test in exclusive mode; select f_test('blah'); end"; done

interestingly this eliminates the deadlocks.

Best Answer

I posted this to pgsql-bugs and the reply there from Tom Lane indicates this is a lock escalation issue, disguised by the mechanics of the way SQL language functions are processed. Essentially, the lock generated by the insert is obtained before the exclusive lock on the table:

I believe the issue with this is that a SQL function will do parsing (and maybe planning too; don't feel like checking the code right now) for the entire function body at once. This means that due to the INSERT command you acquire RowExclusiveLock on the "test" table during function body parsing, before the LOCK command actually executes. So the LOCK represents a lock escalation attempt, and deadlocks are to be expected.

This coding technique would be safe in plpgsql, but not in a SQL-language function.

There have been discussions of reimplementing SQL-language functions so that parsing occurs one statement at a time, but don't hold your breath about something happening in that direction; it doesn't seem to be a high priority concern for anybody.

regards, tom lane

This also explains why locking the table outside the function in a wrapping plpgsql block (as suggested by @ypercube) prevents the deadlocks.

Related Solutions

Sql-server – What are the main causes of deadlocks and can they be prevented

tracking deadlocks is the easier of the two:

By default, deadlocks are not written in the error log. You can cause SQL to write deadlocks to the error log with trace flags 1204 and 3605.

Write deadlock info to the SQL Server error log: DBCC TRACEON(-1, 1204, 3605)

Turn it off: DBCC TRACEOFF(-1, 1204, 3605)

See "Troubleshooting Deadlocks" for a discussion of trace flag 1204 and the output you will get when it is turned on. https://msdn.microsoft.com/en-us/library/ms178104.aspx

Prevention is more difficult, essentially you have to look out for the following:

Code Block 1 locks resource A, then resource B, in that order.

Code Block 2 locks resource B, then resource A, in that order.

This is the classic condition where a deadlock can occur, if the locking of both the resources is not atomic, the Code Block 1 can lock A and be pre-empted, then Code Block 2 locks B before A gets processing time back. Now you have deadlock.

To prevent this condition, you can do something like the following

Code Block A (psuedo code)

Lock Shared Resource Z
    Lock Resource A
    Lock Resource B
Unlock Shared Resource Z
...

Code Block B (pseudo code)

Lock Shared Resource Z
    Lock Resource B
    Lock Resource A
Unlock Shared Resource Z
...

not forgetting to unlock A and B when done with them

this would prevent the deadlocking between code block A and code block B

From a database perspective, I'm not sure on how to go about preventing this situation, as locks are handled by the database itself, i.e. row/table locks when updating data. Where I've seen the most issues occur is where you saw yours, inside a cursor. Cursors are notoriously inefficient, avoid them if at all possible.

Sql-server – How to monitor deadlocks

You can use Event Notifications to get notified whenever a deadlock happens on the server:

See:

Additional links:

The Anatomy of a Deadlock - Jonathan Kehayias
Anatomy of a Deadlock - Part Deux - Jonathan Kehayias
Read/Write deadlock - by Remus Rusanu