How do most "popular" (MySQL, Postgres…) database system handle altering tables on live production databases (like adding, deleting or changing the type of colums)?
I know the correct way is to backup everything schedule downtime and do then do the changes.
But… does any current database system support doing these things "on-line" without stopping anything? (maybe just delaying the queries that reference a column that is just being changed/deleted)
And what does it happen when I just do an ALTER TABLE...
on a live running database?
Does everything stop when this happens?
Can data get corrupted?
etc.
Again, I'm mostly referring to Postgres or MySQL as these are what I encounter.
(And, yes, anytime I had to do this before I did it "the right way", backing things up, scheduling downtine etc. …but I just want to know if it's possible to do this sort and things "quick and dirty" or if there is any DB system that actually has support for "quick, live and dirty" schema changes)
Someone just suggested Online Schema Change for MySQL from Facebook script (with a tutorial here and source here)… seems like a nice way to automate a the set of "hacky" ways to do it… has anyone ever used it in something resemblig production?
Best Answer
When you issue an
ALTER TABLE
in PostgreSQL it will take anACCESS EXCLUSIVE
lock that blocks everything includingSELECT
. However, this lock can be quite brief if the table doesn't require re-writing, no newUNIQUE
,CHECK
orFOREIGN KEY
constraints need expensive full-table scans to verify, etc.If in doubt, you can generally just try it! All DDL in PostgreSQL is transactional, so it's quite fine to cancel an
ALTER TABLE
if it takes too long and starts holding up other queries. The lock levels required by various commands are documented in the locking page.Some normally-slow operations can be sped up to be safe to perform without downtime. For example, if you have table
t
and you want to change columncustomercode integer NOT NULL
totext
because the customer has decided all customer codes must now begin with anX
, you could write:... but that would lock the whole table for the re-write. So does adding a column with a
DEFAULT
. It can be done in a couple of steps to avoid the long lock, but applications must be able to cope with the temporary duplication:This will only prevent writes to
t
during the process; the lock nameEXCLUSIVE
is somewhat deceptive in that it excludes everything exceptSELECT
; theACCESS EXCLUSIVE
mode is the only one that excludes absolutely everyting. See lock modes. There's a risk that this operation could deadlock-rollback due to the lock upgrade required by theALTER TABLE
, but at worst you'll just have to do it again.You can even avoid that lock and do the whole thing live by creating a trigger function on
t
that whenever anINSERT
orUPDATE
comes in, automatically populatescustomercode_new
fromcustomercode
.There are also built-in tools like
CREATE INDEX CONCURRENTLY
andALTER TABLE ... ADD table_constraint_using_index
that're designed to allow DBAs to reduce exclusive locking durations by doing work more slowly in a concurrency-friendly way.The
pg_reorg
tool or its successorpg_repack
can be used for some table restructuring operations as well.