Alter table alter column causes huge log and data file growth

alter-tabletransaction-log

I have a table with 2.6B rows in it and I need to change a column from Char(1) to Varchar(64). the table is over 300GB with a few indexes. The column in nullable, so the transaction i'm running is:

ALTER TABLE XXXX ALTER COLUMN YYYYY varchar(64) NULL

I understand that this is a logged operation, so i pre-sized the transaction log to 300GB thinking it would be enough especially with log backups every 5 minutes to allow log space reuse. well the transaction log grew to 812GB before I had to cancel the transaction due to drive space issues.

I also experienced a very large increase in data file USED size which I don't know why that would be. data file used size came to right around 200GB (very minimal transactions are done in this DB so i know the increase is from this alter table command).

I have some questions:

  1. Why would the data files experience an increase when just changing from char(1) to varchar(64)? I was under the impression that this should not change the amount of data stored unless that space is actually needed since its varchar. all existing values were either null or 1 byte since char(1) is the existing datatype, no existing data needed to be expanded.
  2. While trying to figure out how to do this better, I came across this answer from Aaron Bertrand and it seems like it would work in my case as well. would this be a better way to accomplish this task?
  3. this DB is in a 2 node synchronous commit AG and i noticed the redo_queue_size got to over 130GB, which means that the primary was sending log data at a faster pace than the secondary could apply them. This means that the log on the primary could not truncate when the log backups were completed. is this a normal behavior for something like this? is the alter column statement processed as one large transaction? if so that would explain why the log kept growing.

Best Answer

  1. minimum storage for a VARCHAR is 4 bytes, I think, so you did change the size of the column from 1 byte to at least 4 bytes, causing (as you have observed) large amounts of data movement as the column grows and it shuffles other things around it.

  2. I would recommend Aaron's approach as a faster method to achieve the same end. The downside to it is that column order is not preserved.... but since we should always (lol) use named columns for our selects and inserts, the order of storage shouldn't matter. (may need to refresh frameworks....)

  3. You are correct. DDL is a logged transaction and thus the behavior you observed in your log file / AG health is to be expected.