Sql-server – What effect will reducing the size of a varchar column have on the database file

disk-spacesql serversql-server-2012varchar

We have a number of tables in our database that have VARCHAR(MAX) columns where a VARCHAR(500) (or something much smaller than max) will suffice. Naturally I want to clean these up, and get the sizes down to more reasonable levels. The 'how' to do this I understand: my question is what will altering these columns do to the pages and extants on disk? (There's lot of info out there about what happens when you grow a column, but having trouble finding info on what happens when you shrink one.)

Some of the tables have a very small row count, so I'm not worried about the cost of the change, but some are quite large, and I'm worried about them potentially being reorganized and causing a lot of blocking/downtime. In practical terms, I just want a way to estimate a maintenance window. In general, I'd like to understand better how the database engine behaves in this case.

Thanks in advance!

EDIT:

I have 20 tables I'm looking at, though only half of them have row counts greater than 1,000. The biggest has almost a million rows. The worst offender is a table with 350,000 rows and four VARCHAR(MAX) columns that can shrunk to the VARCHAR(500) level.

Best Answer

First things first: How much data is there in the table? Number of rows and size of the table?

Second: Can you back up and restore this table to a test server and run the alter statement to see the impact (assuming it is not unfeasible due to the table being too large to fit on a non-Production system)? I always find that testing in my environment is more accurate than advice from the interwebs since there are several factors that can influence the outcome that might not be provided in the question simply due to not knowing that those factors could affect the outcome.

Third: increasing the size of a variable-length field is (assuming you don't go over the 8060 byte limit) a simple meta-data operation since no actual data would be changing for such an operation. BUT, on the other hand, reducing the size of a variable-length field, even to something that will more than obviously work, is not a simple meta-data change because SQL Server doesn't know, prior to scanning all of the rows, that the newly requested size is valid.

Hence: Yes, this will lock the table for a period of time. How much time? Well, here is the test that I just did:

I had, from some other testing, a table with a single INT NOT NULL field and 1 million rows. I copied it to a new table for the purpose of doing this test via:

SELECT *, CONVERT(NVARCHAR(MAX), NEWID()) AS [StringField]
INTO dbo.ResizeTest
FROM dbo.ClusteredUnique;

This way I was starting with a similar scenario of having a MAX field (I just realized that you have VARCHAR and I am using NVARCHAR, but that shouldn't alter the behavior I am seeing) that I could then change to 500. And it has data in it that can easily fit within 500 characters. That took a few minutes.

I then ran:

ALTER TABLE dbo.ResizeTest ALTER COLUMN [StringField] NVARCHAR(500) NULL;

And that took just over 11 minutes.

I just re-ran the test again, this time dropping the [ResizeTest] table and changing both NVARCHARs to be just VARCHAR, just to be super-sure that I am comparing apples to something that at least looks like an apple ;-).

The initial table creation took 20 seconds while the ALTER TABLE took 2 minutes.

So, in terms of estimating downtime, that is really hard to do as it is based on disk I/O speeds, whether or not any auto-growth operations need to happen on the data file and/or the transaction log, etc. That is probably a large part of why my first test took 11 minutes to alter and the second, even with VARCHAR being half the size of the NVARCHAR data, took only 2 minutes (i.e. the files were pre-grown at that point). But still, you should keep in mind that my test is running on my laptop which is not the fastest disk, but it was also just 1 million rows of 2 small columns (22 or so bytes per row).

And since you asked what will it do to the data pages, here is your answer. I did an sp_spaceused after creating the table, after doing the ALTER COLUMN, and after doing ALTER TABLE dbo.ResizeTest REBUILD;. The results (the following numbers are based on the second test using VARCHAR, not the first test using NVARCHAR):

After initial table creation:        526,344 KB
After ALTER COLUMN VARCHAR(500):   1,031,688 KB  <--- !! Yikes!!
After ALTER REBUILD:                 526,472 KB

If you are concerned about needing to keep the operation to the shortest time possible, check out an article that I wrote about doing just that: Restructure 100 Million Row (or more) Tables in Seconds. SRSLY! (free registration required).