Postgresql – Optimizing bulk update performance in Postgresql with dependencies

dependenciesperformancepostgresqlupdate

Basically my question is the same as this one, but WITH dependencies, so drop/renaming the table is not a trivial option (I assume).

We are refactoring a large, poorly designed table which has many columns and references to it. It currently has a text field that should be a foreign key. The naive update looks like:

ALTER TABLE ADD COLUMN new_id int REFERENCES list(id);
UPDATE myTable SET new_id=(SELECT id FROM list WHERE name=old_text);
ALTER TABLE myTable DROP COLUMN old_text;

The above takes practically forever because the table is large, and basically gets temporarily doubled due to UPDATE being equivalent to INSERT/DELETE.

We do not need everything done in one transaction. So we are considering some sort of external script to do the updates in batches of 5000 or so, but tests indicate it will still be painful/slow.

Suggestions on how to improve performance?

Best Answer

Given that you can not afford to drop and recreate the table, this related answer would be a better fit:

You might drop expendable indexes and recreate them when you are done (if they aren't completely expendable).

And all the general advice for performance optimization applies.

There is not much more you can do, if you have to update the table bit by bit. Faster alternatives drop and rewrite / recreate from scratch.

Related Question