PostgreSQL – Best Practices for Schema Changes and Data Migrations Without Downtime

best practicesdeploymentmigrationpostgresqlschema

How do you make schema changes to a live database without downtime?

For instance, lets say I have a PostgreSQL database with a table including various user data like email addresses etc, all associated with specific users. If I wanted to move the email addresses to a new dedicated table, I'd have to change the schema and then migrate the email data across to the new table. How could this be done without stopping writes to the original table? Surely while data is written over from the old table to the new one, new data would continue to be written to the old table and be missed, right?

I guess this problem comes up pretty frequently but I can't find any standard solution for dealing with it.

This article deals with the problem but I didn't really understand step 3. He says to write to both tables, then migrate old data from the first table to the new one. How do you make sure you're only migrating old data?

(I use PostgreSQL on Heroku.)

Best Answer

You almost have your answer already:

  1. Create the new structure in parallel
  2. Start writing to both structures
  3. Migrate old data to the new structure
  4. Only write and read new structure
  5. Delete old columns

As for step 3, use something like this (in one transaction):

Insert what is not there yet:

INSERT INTO new_tbl (old_id, data)
SELECT old_id, data
FROM   old_tbl
WHERE  NOT EXISTS (SELECT * FROM new_tbl WHERE new_tbl.old_id = old_tbl.old_id);

Update what has changed in the meantime:

UPDATE new_tbl
SET    data  = old.data
USING  old_tbl
WHERE  new_tbl.old_id = old_tbl.old_id
AND    new_tbl.data IS DISTINCT FROM old_tbl.data;

New data will not be touched, because it is identical in both places.