Postgresql – Is it bad practice to add columns to an existing table

alter-tabledatabase-designpostgresql

Conversely: Is it better to get all the columns created at the time when a new table is created?

I'm working on a new system and new requirements are coming up all the time. The latest requirement is to add a new tag field to each customer so that it will be easier to correlate data between the old and the new system. The new system is not in production yet, but some test runs of the migration process have been done.

At the moment it is still viable to delete the table and rerun the batch load, but what about the future – when there is a lot of live data in the system and a requirement for a new column comes up: Does it have any impact on performance for example, to export the data, recreate the table with all the columns, and import the data again, over just doing an ALTER TABLE ADD column …. ?

In case it makes any difference, the solution is based on PostgreSQL 9.5, and if it does make a difference it would be interesting to know which DBMS'es care more or less.

Does whether or not an index is created on this column affect the answer? E.g when a Unique constraint is set.

Best Answer

ALTER TABLE is there for a reason. More seriously, unless you plan to have a really big data set, you shouldn't worry about adding new columns on demand. Dropping and recreating the table is only viable until you have no (important) data, meaning you have to use ALTER TABLE ... ADD COLUMN later in any case.

When I mention 'really big' above, it's about aligment and padding of the different data types used for the column definitions. For example, a row of a table with columns (in this order) (smallint, integer, smallint) will be slightly (2 bytes) wider than one with (smallint, smallint, integer). This starts to make a difference only with tables of 10s (on decent hardware maybe 100s) of million rows or when the table has many columns and a lot of rows. More about this in Erwin Brandstetter's excellent answer.

When adding a new column in pre-11 Postgres versions, beware the trap of ALTER TABLE ... ADD COLUMN ... NOT NULL DEFAULT .... As ALTER TABLE ... ADD COLUMN requires a heavy lock that prevents access to the table for concurrent sessions, you have to keep the transaction as short as possible. If you manage to do this, adding a new column will be hardly noticeable performance-wise to the other processes.