Postgresql – Adding nullable column to table costs more than 10 minutes

alter-tabledatabase-designpostgresqlpostgresql-9.1

I have problems to add a new column on a table.
I tried to run it a couple of times, but after more than 10 minutes running, I decided to cancel the query because of lock time.

ALTER TABLE mytable ADD mycolumn VARCHAR(50);

Useful information:

PostgreSQL version: 9.1
Number of rows: ~ 250K
Number of columns: 38
Number of nullable columns: 32
Number of constraints: 5 (1 PK, 3 FK, 1 UNIQUE)
Number of indexes: 1
OS type: Debian Squeeze 64

I found interesting information about the way PostgreSQL manages nullable columns (via HeapTupleHeader).

My first guess is that because this table already has 32 nullable columns with 8-bits MAXALIGN, HeapTupleHeader is 4 Bytes length (not verified, and I don't know how to do so).

So adding a new nullable column could need an update of HeapTupleHeader on every row to add a new 8-bits MAXALIGN, which could cause performance issues.

So I tried to alter one of the nullable columns (which is not really nullable in reality) in order to decrease to 31 the number of nullable columns, to check if my guess could be true.

ALTER TABLE mytable ALTER myothercolumn SET NOT NULL;

Unfortunately, this alter also takes very long time, more than 5 minutes, so I also aborted it.

Do you have an idea of what could cause this performance cost?

Best Answer

There are a couple of misunderstandings here:

The null bitmap is not part of the heap tuple header. Per documentation:

There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap ...

Your 32 nullable columns are unsuspicious for two reasons:

The null bitmap is added per row, and only if there is at least one actual NULL value in the row. Nullable columns have no direct impact, only actual NULL values do. If the null bitmap is allocated, it's always allocated completely (all or nothing). The actual size of the null bitmap is 1 bit per column, rounded up to the next byte. Per current souce code:
```
#define BITMAPLEN(NATTS) (((int)(NATTS) + 7) / 8)
```
The null bitmap is allocated after the heap tuple header and followed by an optional OID and then row data. The start of an OID or row data is indicated by t_hoff in the header. Per comment source code:

Note that t_hoff must be a multiple of MAXALIGN.
There is one free byte after the heap tuple header, which occupies 23 bytes. So the null bitmap for rows up to 8 columns effectively comes at no additional cost. With the 9th column in the table, t_hoff is advanced another MAXALIGN (typically 8) bytes to provide for another 64 columns. So the next border would be at 72 columns.

To display control information of a PostgreSQL database cluster (incl. MAXALIGN), example for a typical installation of Postgres 9.3 on a Debian machine:

    sudo /usr/lib/postgresql/9.3/bin/pg_controldata /var/lib/postgresql/9.3/main

I updated instructions in the related answer you quoted.

All that aside, even if your ALTER TABLE statement triggers a whole table rewrite (which it probably does, changing a data type), 250K are really not that much and would be a matter of seconds on any halfway decent machine (unless rows are unusually big). 10 minutes or more indicate a completely different problem. Your statement is waiting to get a lock on the table, most likely.

The growing number of entries in pg_stat_activity means more open transactions - indicates concurrent access on the table (most likely) that has to wait for the operation to finish.

A few shots in the dark

Check for possible table bloat, try a gentle VACUUM mytable or a more aggressive VACUUM FULL mytable - which might encounter the same concurrency issues, since this form also acquires an exclusive lock. You could try pg_repack instead ...

I would start by inspecting possible issues with indexes, triggers, foreign key or other constraints, especially those involving the column. Especially a corrupted index might be involved? Try REINDEX TABLE mytable; or DROP all of them and re-add them after ALTER TABLE in the same transaction.

Try running the command in the night or whenever there is not much load.

A brute-force method would be to stop access to the server, then try again:

Force drop db while others may be connected in postgresql

Without being able to pin it down, upgrading to the current version or the upcoming 9.4 in particular might help. There have been several improvements for big tables and for locking details. But if there is something broken in your DB, you should probably figure that out first.

EDIT (2012-11-27)

You can set the sequence by a query like

SELECT setval('events_eventid_seq', (SELECT max(eventid) FROM events));

In order to let the sequence produce the next value for eventid, you must not set a value explicitly in your INSERT. If you do, it will take the manually supplied value and won't advance the sequence. This will cause the same unique violation error later.

Regarding ALTER TABLE: as you could see, there is some discrepancy between ALTER TABLE - type does not exists and CREATE TABLE. Are you sure you try this on the same database? I can't remember the older versions, but already 8.3 had the bigserial shorthand (it's not a real type). (Giving it further thought, it is quite possible you can't use it in ALTER TABLE, since it is a shorthand...) You can solve the problem by following the manual:

CREATE SEQUENCE events_eventid_seq;

ALTER TABLE events 
ALTER COLUMN eventid TYPE bigint;

ALTER SEQUENCE events_eventid_seq OWNED BY events.eventid;

SELECT setval('events_eventid_seq', (SELECT max(eventid) FROM events));

Adding a bool column; tracking untouched vs. false

In most DBMSs, or at least the ones I have dealt with, a Bool is really a trinary. You have 1, 0 and NULL. NULL is specifically there for "I haven't entered any data yet." Depending on how you code it your checkbox should default to unchecked for 0 or NULL. Then 1 for checked of course. However in your reporting you can actually differentiate between rows where someone has looked at it and entered a value (1 or 0) and rows where no one has updated it yet (rows where the value is NULL).

Best Answer

A few shots in the dark

Related Solutions

PostgreSQL duplicate key violates unique constraint error

EDIT (2012-11-27)

Adding a bool column; tracking untouched vs. false

Related Question