SQL Server – Best Practices for Updating Large Tables

alter-tablebest practicessql serversql-server-2008-r2

I know there are a number of topics on this question, but I'm always seeking more insights.

I have a large table with a billion+ records. The amount of records could be reduced and archived, but the size will still be large. My task is to change a existing data type of a single column where the old value of data is safe to convert into the type.

Here are some of my approaches:

1 – Drop the constraints on the table that impact the targeted column, drop the indexes that also impact the targeted column, add a new column with NULL's at the end of the table, update the new column with the old column values in chunks of 10K, 50K or 100K increments, drop the old column when data has been copied and reapply indexes for that column only.

2 – Copy all data into a new table with the data type change in chunks as before, verify data is completed, drop the old table, rename the new table to the old and apply indexes.

3 – Import all data from another data source like a flat file to a new table with the data type change using BULK INSERT and MERGE SP's, basically like option 2 with having 2 duplicate tables, verify data, drop old to replace with new table and apply indexes.

What would be the fastest and safest option? Are there other options I'm not considering? I've updated 100 million records for other tables really well with option 1. The bigger the table, the harder option 1 becomes due to the time duration of updating.

Best Answer

I have always gone with Option #2, and have done so in such a way that there was minimal impact on the system as it was always assumed that that application would be running continuously, or down for 30 minutes tops. So, I would build the new table and slowly migrate the data into it over the course of a week. I would use a SQL Agent job to insert rows and an AFTER UPDATE, DELETE trigger to keep already migrated rows in sync.

I have detailed this approach in an article on SQL Server Central: Restructure 100 Million Row (or more) Tables in Seconds. SRSLY!.