MySQL – How to Achieve Zero Downtime When Upgrading Database Schema

migrationMySQLschema

I am looking for a solution in achieving zero downtime schema upgrade with the assumption that every schema upgrade is backwards-compatible with the previous application version which makes things easier. Since the database is already huge, the upgrade may take more than 2 days to complete and the business cannot afford to stop their operations that long.

I found a candidate solution in
https://stackoverflow.com/a/40365479
where its steps are comprise of:

  • Create a temp table
  • Creates triggers on the first table (for inserts, updates, deletes) so that they are replicated to the temp table
  • In small batches, migrate data
  • When done, rename table to new table, and drop the other table

But my concern is would not it cause a potential race condition problem in the last step where for instance, we have the original table 'users' and the new table 'users_new', and our goal is to replace the original table with the new table? Because to accomplish that, won't it be needed to execute two separate DDL operations such as

DROP TABLE users;
ALTER TABLE users_new RENAME users

And by doing so, what if a query was executed against the users table during the time between the execution of 1st and 2nd DDL commands?

Best Answer

pt-online-schema-change in the Percona Toolkit is used in production in many companies for many years. It's more than a candidate solution.

It avoids the race condition by dropping old table later.

RENAME TABLE users TO user_old, users_new TO users;
DROP TABLE users_old;

Check out the source code around "Step 5: Rename tables: orig -> old, new -> orig"

pt-osc creates three triggers - on update, on delete and on insert. So when any changes are made during copying from users to users_new, the triggers reflect the change in users_new. So by a moment right before the RENAME, users and users_new are identical.