Sql-server – pessimistic locking and client death

lockingpostgresqlsql server

We are considering pessimistic locking for our project (SELECT ... FOR UPDATE); as we are used to optimistic locking, we are afraid about the operation of the server being blocked on such a lock (lock not getting released ever).

The most common concern that people have is obviously deadlocks, however we have another concern: what if the client dies, or looses connection to the server, while holding a SELECT ... FOR UPDATE lock?

In many cases it would not be relevant because if the client dies the application dies, however in this case we will have a cluster of servers with failover, so we wouldn't want other nodes in the cluster to be overly affected if one node dies.

On this topic I have found this 2003 message, concerning postgreSQL:
http://www.postgresql.org/message-id/3E81D7F0.2010604@xythos.com

We have also tested the tcp_keepalives_idle postgreSQL configuration parameter, and confirmed that if we leave it low enough and kill the client, the lock gets released. However if we didn't touch that parameter, the lock did not get released (but maybe we didn't wait long enough).
And that's without considering the concern raised by this 2003 message, about the difference between the lifecycle of the client application and the life-cycle of the connection pool (we'll keep the default on this but we expect each node to have its own connection pool, no sharing across the cluster).

The problem gets even more complicated because we intend to support several database servers (at least Microsoft SQL Server and PostgreSQL, possibly Oracle).

Our solution would use JavaEE and Glassfish as an application server.

So to summarize the question: Is it safe to assume that apart from deadlocks, it is not possible that a lock stays held forever, the lock will be released after a reasonably short time (which range of duration?), even in case of a sudden death or loss of connectivity to a client holding a lock?

Best Answer

No, I don't think it's safe to assume locks from dead/vanished clients are released in a bounded and deterministic amount of time with all DBMSes and drivers. You'll need to investigate each configuration separately.

In the case of PostgreSQL you're generally but not always OK if you have TCP keepalives set quite aggressively, because:

  • If the whole client application process dies but the client host stays up the host's kernel will RST the TCP connection as part of process cleanup;
  • If the client host dies entirely then it'll stop responding to tcp keepalives; and
  • If the client host remains alive but the network fails in one or both directions between client and server then it'll stop responding to tcp keepalives.

However, there are a few cases that will not be handled:

  • Connection pool bugs that result in a connection being returned to the pool with a transaction still open and holding locks;
  • Connection pools that don't DISCARD ALL and thus fail to release and reset session-level resources like advisory locks (if you use them);
  • App server based applications that 'leak' connections with open transactions so the connection pool can never reclaim them;
  • Badly written programs that intentionally hold a transaction open during user "think time" like a dialog box or data entry window, where the user might go away and make a coffee ... or go on holiday for a month;
  • Cases where the application process remains in existence but is totally non-responsive due to being SIGSTOPped, having been paused by a debugger, hitting an internal threading deadlock, etc. The OS will keep on responding to tcp keepalives but the app won't respond to Pg protocol messages or advance its work.

In the case of PostgreSQL you can use active lock monitoring to scan for and terminate long running transactions that haven't done anything in a while. In particular, you can deal with <IDLE> in transaction sessions by scanning pg_stat_activity (though it's only possible to do this RELIABLY and EASILY in 9.2). With a bit more effort you can use pg_locks to watch for queries blocked on a lock for more than x seconds and kill the session holding the lock, though this can make it hard to run some DDL like index creation.

What you really need is application level keepalives, where the app says "Yup, I'm alive and responsive". These are rather harder to implement, though.

One thing that will help is that both PgBouncer and PgPool-II (external connection pools for PostgreSQL) support controls for session and transaction timeouts. We've wanted to implement similar options in the core PostgreSQL for some time, but nobody's come up with a design that's robust enough to handle all the corner cases, so for now your best bet is to use an external pooler. You can do this even if you're also using an application-level connection pool.

On the good news front, PostgreSQL automatically detects and breaks deadlocks between transactions, so one thing you don't have to worry about much is deadlocks at the SQL level when using PostgreSQL.