PostgreSQL transaction locked database table: “idle in transaction”

linuxlockingpostgresqlpostgresql-8.4

I have a web application interacting with PostgreSQL (v8.4 & Centos Linux) which suddenly started locking some of the database's tables. Still have no idea what happened, since the code is not new, has been running several time and has been tested beforehand.

I am trying to see what transaction might have cause it, or if a combination of it with an autovacuum process could have caused it. But in the meantime I would like to unlock the database tables. I tried restarting the postgreSQL service, terminating thus the processes in "idle in transaction" state which were locking the tables but it didn't work: next time my application performed the same call the tables went locked again.

Any ideas on what can I do to unlock the database gracefully?

Best Answer

What you are seeing are transactions which are running, holding locks, and not doing anything. What you need to do is trace the transactions back to their source and find out what they are doing. They may be other admins who have logged in and forgot to log out.

In general it is usually a bad thing to hold transactions open and idle for a long time due to this sort of locking problem.

Related Solutions

MySQL Load Data Infile Slow Query Logging – How to Resolve

The query under process ID 180233 looks like it is in distress.

Here is the query itself

SELECT COUNT(DISTINCT A.`campaignid`)  INTO _c
FROM `ox_campaigns` A 
INNER JOIN `selfserving_users` B ON B.`user_id` = A.`uid`
INNER JOIN `v3_cam_date` C ON C.`campaignid` = A.`campaignid`
WHERE A.`revenue_type` = 5 AND A.`deleted` = 0
AND A.`expire` = DATE_ADD(CURRENT_DATE, INTERVAL 1 DAY) 
AND A.`isExpired` = 0
AND IF( NAME_CONST('_permitid',3) = -1, 1=1,
IF( NAME_CONST('_permitid',3) = 0, A.`uid` IN
(SELECT C.`user_id` FROM `selfserving_users` C
WHERE C.`groupid` =  NAME_CONST('_groupid',12) ) ,
A.`uid` =  NAME_CONST('userid',388)));

The scary part about the query is the self-reference

You have selfserving_users acting in a self serving manner against itself.

Sometimes, the MySQL Query Optimizer will play a bait-and-switch, smoke-and-mirrors games with data, especially with a self reference, in order to formulate the best EXPLAIN plan possible. While mysql is very capable of completing sub-SELECTs, it can be still be expensive.

However, this is just a symptom that manifested because of Process ID 97. What is really the issue here?

LOAD DATA INFILE against an InnoDB table could make mysqld a little punch drunk. I don't believe (or at least I don't exercise full confidence) you can encapsulate it as a normal transaction although this was addressed back in MySQL 5.0.

Just picture it:

You are hammering the InnoDB Buffer
Some memory swapping may be going on
Possible full table locking issues that are affecting data pages outside the v3_zone_date table (such was with the selfserving_users table)

There may be a way to throttle the LOAD DATA INFILE process on an InnoDB table. I cannot give you a solid answer on this one, but try this link from Baron Schwartz.

UPDATE 2012-02-22 12:00 EST

There is open bug report in MySQL 5.5.7 called Deadlock when DDL under LOCK TABLES WRITE, READ + PREPARE. At the bottom of the report, a person complained about a block problem cause by the explicit LOCK TABLES.

Launching a COMMIT on locked rows in a table would hang because of trying to unraveling MVCC data assocaited with the locked rows. Based on the InnoDB Status you have shown, there would exist 6933 row locks on the table you are importing. I know that in Oracle, when introducing new rows to a table, MVCC is still generated because the previous version of the newly inserted row is a nonexistent row. The same must be occurring for InnoDB.

UPDATE 2012-02-22 12:42 EDT

In your question you stated the following about your .NET process

LOCK TABLES;
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOAD DATA;
COMMIT;
UNLOCK TABLES;
SET autocommit=1;
SET unique_checks=1;

All of these events are running within the same DB Session. This is also happening within one DB Connection. Thus, this is not a deadlock in the traditional sense. It is just a case of blocking your COMMIT within a given DB Connection/Session because the tables were locked within the same DB Connection/Session.

UPDATE 2012-02-23 19:00 EDT

I would change the sequence to be this:

SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOCK TABLES;
LOAD DATA;
UNLOCK TABLES;
COMMIT;
SET autocommit=1;
SET unique_checks=1;
SET foreign_key_checks=1;

Please remember, a COMMIT cannot proceed if you have the tables locked in serial fashion. Therefore, UNLOCK TABLES must precede COMMIT.

Postgresql – Aggressive Autovacuum on PostgreSQL

Eelke is almost certainly correct that your locking is blocking autovacuum. Autovacuum is designed to give way to user activity, deliberately. If those tables are locked, autovacuum cannot vacuum them.

For posterity, however, I wanted to give an example set of settings for hyper-aggressive autovacuum, since the settings you gave don't quite do it. Note that making autovacuum more aggressive is unlikely to solve your problem, however. Also note that the default autovacuum settings are based on running over 200 test runs using DBT2 seeking an optimal combination of settings, so the defaults should be assumed to be good unless you have a solid reason to think otherwise, or unless your database is significantly outside the mainstream for OLTP databases (e.g. a tiny database which gets 10K updates per second, or a 3TB data warehouse).

First, turn on logging so you can check up on whether autovacuum is doing what you think it is:

log_autovacuum_min_duration = 0

Then let's make more autovac workers and have them check tables more often:

autovacuum_max_workers = 6
autovacuum_naptime = 15s

Let's lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:

autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1

autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05

Then let's make autovacuum less interruptable, so it completes faster, but at the cost of having a greater impact on concurrent user activity:

autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 1000

There's your full program for generically aggressive autovacuum, which might be apppropriate for a small database getting a very high rate of updates, but might have too great of an impact on concurrent user activity.

Also, note that autovacuum parameters can be adjusted per table, which is almost always a better answer for needing to adjust autovacuum's behavior.

Again, though, it's unlikely to address your real problem.