If you don't want to interfere with other activity, the one UPDATE at a time in autocommit mode is very likely the best option. You should probably set synchronous_commit=off in that session (and only that session).
The indexes are going to slow you down, perhaps by a lot depending on your RAM and your IO system. But if the index is necessary for the other actions you don't want to interfere with, then there isn't anything you can do about it.
But since the fid is not yet correctly populated, the index on it is probably not actually useful to the concurrent processes you want avoid interfering with, as they haven't been changed yet to rely on that column being accurate. If that is the case, you can drop that index to gain speed, and build it in bulk later. The same probably applies to the foreign key constraint.
Once that index is gone, your updates can proceed via HOT (Heap Only Tuples) updates provided each block has enough free space. In that case, the updates will not have to do maintenance on the primary key index, either, saving that much more IO. To maximize the likelihood that this will work optimally, it is important that each UPDATE be its own transaction. That way one UPDATE can reuse space freed up by an earlier one.
Also, your WHERE clause should probably be like:
WHERE id=345 and fid is not null;
That way if the script gets interrupted, you can re-run it with minimal damage.
Since you seem to be running this on a test system already, then an EXPLAIN (ANALYZE,BUFFERS) of some of the updates would be helpful, especially with track_io_timing set to on.
Like @dezso commented, creating a new table and dropping the old used to be faster in old versions, but not any more with the new implementation in pg 9.1.
The most common problem with CLUSTER
is that it requires an exclusive lock on the table, which does not go well with concurrent access to it.
The solution to this problem is pg_repack
, which does not lock the table exclusively.
Generally, make sure that your server configuration is fit for the task. High settings (a lot of RAM) for maintenance_work_mem
would help both CLUSTER
and CREATE INDEX
on big tables. Standard setting is way too small for you. Follow the links for details.
You might set it very high temporarily for a transaction with SET LOCAL
and leave it at a reasonable setting otherwise:
BEGIN;
SET LOCAL maintenance_work_mem = ????MB; -- find the sweet spot
CLUSTER tbl;
COMMIT;
If possible, set it high enough to fit the whole operation in RAM.
More:
Best Answer
Rename the column, add a generated column with the old name. If inserts or updates of the old column haven't been corrected yet, provide insert and update triggers to handle the new column. When no logic refers to the old column anymore, drop it.