Sql-server – How to drop a column from a large (700M rows) hot (multiple transactions per second) table with minimal downtime

lockingsql serversql-server-2016

My research suggests I can drop a column on a table in a matter of moments (provided I obtain an exclusive lock) because it is only a metadata change.

The table has two different services constantly inserting and updating records. It is very hot. Without developing a full reproduction environment of this level of activity, how can I ensure a statement like

ALTER TABLE x DROP COLUMN y 

will successfully obtain an exclusive lock, do the drop quickly, and then unlock the table?

How will SQL Server queue this request (eg First In First Out)?
Can I be sure the DROP COLUMN will truly only take a few moments?

Locking this table in PROD for an extended time will be unacceptable so I am trying to avoid surprises.

Other notes:
Later I will re-index ONLINE to defragment the index and reclaim the space.

Best Answer

How will SQL Server queue this request (eg First In First Out)? Can I be sure the DROP COLUMN will truly only take a few moments?

The way the locking works for ALTER TABLE is that you need a Schema Modification Lock (Sch-M) to perform the ALTER. Every other query that touches the table needs a Schema Stability Lock (Sch-S). Sch-M and Sch-S are incompatible.

Also while you are waiting to acquire the Sch-M lock, you block any new requests for a Sch-S lock. So you wait for all in-flight queries and transactions to release their Sch-S locks on the table, and block all new queries while you wait. This is the normal way DDL locking works, and so long as you don't have any long-running queries on your table it works fine.

But if you have long running queries that your DDL can get stuck behind, your DDL in turn blocks all new queries. So running DDL on a busy system with both short-running and long-running operations was a problem.

Low Priority Waits solves this problem by allowing your DDL statement to wait at reduced priority, not blocking new queries. The DDL session will just wait for a "window" where no other session has any lock on the table, and then run. But since the DDL session is not blocking new queries, it might wait forever.

Unfortunately low priority waits have not been enabled for dropping columns. You can create a feedback item to ask for it to be implemented here. Has been enabled for online index rebuild and partition switching.

You can mostly emulate this behavior with LOCK_TIMEOUT and retries. So configure your DDL session with a lock timeout, so it will abandon the ALTER TABLE if the lock can't be acquired quickly.

So run the alter like this:

set lock_timeout 1000 --1 second
alter table SomeTable
drop column if exists ColumnToDrop 

And it will either succeed quickly or fail with a lock timeout error. If you want to run this in an unattended job, you can use a retry loop to keep trying until the ALTER TABLE succeeds quickly.

Something like this:

set lock_timeout 1000 --1 second
declare @tries int = 0

while 1=1
begin 
    begin try
        set @tries = @tries + 1;

        alter table SomeTable
        drop column if exists ColumnToDrop 

        print 'complete'
        break;
    end try
    begin catch
      if @tries > 10 or error_number() <> 1222 throw;

      declare @msg nvarchar(2000) = concat('retry ', @tries);
      print @msg
    end catch
end