Sql-server – Moving data into an archive table and reducing the size of the original table

disk-spacesql serversql-server-2005

On a SQL Server I inherited, there is a 55GB database on a 70GB drive. There are many small tables, and one big table that is 50GB in size (approx 36GB data, 15GB index). The log file is on a different drive. After shrinkfile the log file's drive has about 12GB free. Database is in simple recovery mode.

The table is badly designed, for example there are many columns defined as char(10) that have nothing bigger than 3 characters in them.

There isn't enough free space on the drive to alter these columns and then rebuild the clustered index (when altering a column the database creates a new column then copies the data from the old).

I'm wondering if I can instead do the following:

  • Create an archive table with a much better space saving design (appropriately sized columns for the data they contain)
  • Incrementally move old data from the huge table to the archive table.

Will the source table reduce in size when data is deleted from it (after being moved to the archive table)? Or would I have to do a clustered index rebuild for that to happen? (In which case I cannot do it because there isn't enough space.)

It now occurs to me that column alteration uses space in the log file, but whenever I do a column alteration on a smaller table, the main data file grows as well.

Best Answer

If allocating space or another drive is impossible, then there might be ways you can do this with minimal growth, assuming you can put the database into RESTRICTED_USER mode or otherwise prevent users/applications from trying to modify the data while you fix it.

First off, don't shrink your data or log files at this point. You'll free up space on the disk temporarily, but the files will just grow again - so as with most shrink operations this is pointless.

Next, take one of the char(10) columns. Ensure that you know the largest string contained:

SELECT MAX(DATALENGTH(CONVERT(varchar(10), char10_column))) 
FROM dbo.table;

Add a new, nullable, varchar(that length) column. (I haven't supported 2005 in some time, so I forget if this is fully or partially online - in theory, this shouldn't be a size-of-data operation, but I don't know if 2005 was smart enough to not treat it as one.)

ALTER TABLE dbo.table ADD newvarchar10_column varchar(10);

In small chunks, as I describe here for deletes, update that new column to have the value of the old column, and set the old column to NULL. This will add to the table size gradually, but will have minimal impact on the log.

UPDATE TOP (?top?) dbo.table -- not sure what your ?top? sweet spot will be
  SET newvarchar10_column = oldchar10_column,
      oldchar10_column = NULL
  WHERE oldchar10_column IS NOT NULL;

-- CHECKPOINT / BACKUP LOG

Once the data is all located in the new column, dropping the old column should be minimal work for the database, since there isn't any actual data to log. If this column is involved in any indexes or constraints, or schema-bound views/functions, you'll have to deal with those first.

ALTER TABLE dbo.table DROP COLUMN oldchar10_column;

Then rename the new column so that it look like the old column.

EXEC sys.sp_rename N'dbo.table.newvarchar10_column', 
       N'oldchar10_column', N'COLUMN';

Don't forget to update any procedures or code that still think this column is a char(10).

This is just a theory. You might want to test it on a non-critical system first to test the effects on data/log files, figure out the sweet spot for ?top?, etc.