How to Rename a Column in PostgreSQL Without Disrupting Users

ddlpostgresql

You can rename a column using the following statement:

ALTER TABLE table_name RENAME COLUMN column_name TO new_column_name;

However, this brings the old column out of commission immediately. Any use sites that used the old column name will start having problems.

Is there a way to put the table into a transitory state in which both the old and new name can be used to refer to the column, so that I can move over my clients and then afterwards decommission the old name?

Best Answer

Rename the column, add a generated column with the old name. If inserts or updates of the old column haven't been corrected yet, provide insert and update triggers to handle the new column. When no logic refers to the old column anymore, drop it.

Related Solutions

PostgreSQL Update – Most Efficient Way to Perform Multiple Updates

If you don't want to interfere with other activity, the one UPDATE at a time in autocommit mode is very likely the best option. You should probably set synchronous_commit=off in that session (and only that session).

The indexes are going to slow you down, perhaps by a lot depending on your RAM and your IO system. But if the index is necessary for the other actions you don't want to interfere with, then there isn't anything you can do about it.

But since the fid is not yet correctly populated, the index on it is probably not actually useful to the concurrent processes you want avoid interfering with, as they haven't been changed yet to rely on that column being accurate. If that is the case, you can drop that index to gain speed, and build it in bulk later. The same probably applies to the foreign key constraint.

Once that index is gone, your updates can proceed via HOT (Heap Only Tuples) updates provided each block has enough free space. In that case, the updates will not have to do maintenance on the primary key index, either, saving that much more IO. To maximize the likelihood that this will work optimally, it is important that each UPDATE be its own transaction. That way one UPDATE can reuse space freed up by an earlier one.

Also, your WHERE clause should probably be like:

WHERE id=345 and fid is not null;

That way if the script gets interrupted, you can re-run it with minimal damage.

Since you seem to be running this on a test system already, then an EXPLAIN (ANALYZE,BUFFERS) of some of the updates would be helpful, especially with track_io_timing set to on.

PostgreSQL – Clustered Index vs. Table Ordered by Column

Like @dezso commented, creating a new table and dropping the old used to be faster in old versions, but not any more with the new implementation in pg 9.1.

The most common problem with CLUSTER is that it requires an exclusive lock on the table, which does not go well with concurrent access to it.

The solution to this problem is pg_repack, which does not lock the table exclusively.

Generally, make sure that your server configuration is fit for the task. High settings (a lot of RAM) for maintenance_work_mem would help both CLUSTER and CREATE INDEX on big tables. Standard setting is way too small for you. Follow the links for details.

You might set it very high temporarily for a transaction with SET LOCAL and leave it at a reasonable setting otherwise:

BEGIN;
SET LOCAL maintenance_work_mem = ????MB; -- find the sweet spot
CLUSTER tbl;
COMMIT;

If possible, set it high enough to fit the whole operation in RAM.

Best Answer

Related Solutions

PostgreSQL Update – Most Efficient Way to Perform Multiple Updates

PostgreSQL – Clustered Index vs. Table Ordered by Column

Related Question