Page counts shown by sys.dm_db_index_physical_stats
and sys.allocation_units
can vary significantly when a variety of DDL modifications are made to the object concerned.
From the Microsoft Docs page on sys.sp_spaceused
There are some situations, for example, after an index is dropped, when the space information for the table may not be current. [...] Use updateusage only when you suspect incorrect values are being returned and when the process will not have an adverse effect on other users or processes in the database. If preferred, DBCC UPDATEUSAGE can be run separately.
When you drop or rebuild large indexes, or drop or truncate large tables, the Database Engine defers the actual page deallocations, and their associated locks, until after the transaction commits. Deferred drop operations do not release allocated space immediately. Therefore, the values returned by sp_spaceused immediately after dropping or truncating a large object may not reflect the actual disk space available.
sys.dm_db_index_physical_stats
has the following definition (in SQL Server 2016):
create function sys.dm_db_index_physical_stats
(
@DatabaseId SMALLINT = 0,
@ObjectId INT = 0,
@IndexId INT = -1,
@PartitionNumber INT = 0,
@Mode nvarchar(20) = NULL
)
returns table
as
return select *
from OpenRowset
( TABLE
INDEXANALYSIS,
@DatabaseId,
@ObjectId,
@IndexId,
@PartitionNumber,
@Mode
)
As you can see, it uses the internal table INDEXANALYSIS
, whereas sys.allocation_units
uses a different source, ALUCOUNT
, as shown in its definition:
CREATE VIEW sys.allocation_units AS
SELECT au.auid AS allocation_unit_id,
au.type,
ip.name AS type_desc,
au.ownerid AS container_id,
convert(int, au.fgid) AS data_space_id,
isnull(ct.reserved_pages, au.pcreserved) AS total_pages,
isnull(ct.used_pages, au.pcused) AS used_pages,
isnull(ct.data_pages, au.pcdata) AS data_pages
FROM sys.sysallocunits au OUTER APPLY OpenRowset(TABLE ALUCOUNT, au.ownerid, au.type, 0, 0) ct
LEFT JOIN sys.syspalvalues ip ON ip.class = 'AUTY' AND ip.value = au.type
These sources vary in their level of consistency as you can see in the output of your queries. If you require absolute certainty about the count of pages used by each, you should do the following:
- Ensure no open, uncommitted transactions exist that have modified the table/index in question.
- Run
EXEC sys.sp_spaceused
with the @updateusage = 'updateusage'
option, or DBCC UPDATEUSAGE
.
- Run
CHECKPOINT
to ensure all modifications have been written to disk.
- Check the output of
sys.dm_db_index_physical_stats
and sys.allocation_units
; the counts should now be the same.
First things first: How much data is there in the table? Number of rows and size of the table?
Second: Can you back up and restore this table to a test server and run the alter statement to see the impact (assuming it is not unfeasible due to the table being too large to fit on a non-Production system)? I always find that testing in my environment is more accurate than advice from the interwebs since there are several factors that can influence the outcome that might not be provided in the question simply due to not knowing that those factors could affect the outcome.
Third: increasing the size of a variable-length field is (assuming you don't go over the 8060 byte limit) a simple meta-data operation since no actual data would be changing for such an operation. BUT, on the other hand, reducing the size of a variable-length field, even to something that will more than obviously work, is not a simple meta-data change because SQL Server doesn't know, prior to scanning all of the rows, that the newly requested size is valid.
Hence: Yes, this will lock the table for a period of time. How much time? Well, here is the test that I just did:
I had, from some other testing, a table with a single INT NOT NULL
field and 1 million rows. I copied it to a new table for the purpose of doing this test via:
SELECT *, CONVERT(NVARCHAR(MAX), NEWID()) AS [StringField]
INTO dbo.ResizeTest
FROM dbo.ClusteredUnique;
This way I was starting with a similar scenario of having a MAX
field (I just realized that you have VARCHAR
and I am using NVARCHAR
, but that shouldn't alter the behavior I am seeing) that I could then change to 500
. And it has data in it that can easily fit within 500 characters. That took a few minutes.
I then ran:
ALTER TABLE dbo.ResizeTest ALTER COLUMN [StringField] NVARCHAR(500) NULL;
And that took just over 11 minutes.
I just re-ran the test again, this time dropping the [ResizeTest]
table and changing both NVARCHAR
s to be just VARCHAR
, just to be super-sure that I am comparing apples to something that at least looks like an apple ;-).
The initial table creation took 20 seconds while the ALTER TABLE
took 2 minutes.
So, in terms of estimating downtime, that is really hard to do as it is based on disk I/O speeds, whether or not any auto-growth operations need to happen on the data file and/or the transaction log, etc. That is probably a large part of why my first test took 11 minutes to alter and the second, even with VARCHAR
being half the size of the NVARCHAR
data, took only 2 minutes (i.e. the files were pre-grown at that point). But still, you should keep in mind that my test is running on my laptop which is not the fastest disk, but it was also just 1 million rows of 2 small columns (22 or so bytes per row).
And since you asked what will it do to the data pages, here is your answer. I did an sp_spaceused
after creating the table, after doing the ALTER COLUMN
, and after doing ALTER TABLE dbo.ResizeTest REBUILD;
. The results (the following numbers are based on the second test using VARCHAR
, not the first test using NVARCHAR
):
After initial table creation: 526,344 KB
After ALTER COLUMN VARCHAR(500): 1,031,688 KB <--- !! Yikes!!
After ALTER REBUILD: 526,472 KB
If you are concerned about needing to keep the operation to the shortest time possible, check out an article that I wrote about doing just that: Restructure 100 Million Row (or more) Tables in Seconds. SRSLY! (free registration required).
Best Answer
The solution I'd think of for this problem is to run weekly a job that will run sp_spaceused for all tables in a database and save this data in a table. If there are differences in size for each table greater than..let's say..10%, I would run the dbcc cleantable.
My code to loop through table sizes looks like that:
Now you only need to build the logic that will verify what would be the size change through the week and schedule it.