Table LogCount
:
Column1 int not null
Column2 int not null
Column3 int not null
[Date] datetime not null
[Count] float(53) not null
This table contains ~86 million rows and has the following indexes:
alter table LogCount add constraint PK_LogCount_Id primary key clustered
( [Date], Column1, Column2, Column3 )
go
create nonclustered index IX_Column2_Date on LogCount ( Column2, [Date] )
include ( [Count] )
go
sp_spaceused
gives the following:
name rows reserved data index_size unused
LogCount 85800181 8089216 KB 4226664 KB 3860968 KB 1584 KB
The Count
column doesn't and never will store floating-point numbers, only integers, so I changed it to a smallint
which (I expected) will save 6 bytes per row (float(53)
= 8 bytes, smallint
= 2 bytes):
drop index LogCount.IX_Column2_Date
go
alter table LogCount alter column [Count] smallint not null
go
create nonclustered index IX_Column2_Date on LogCount ( Column2, [Date] )
include ( [Count] )
go
Then I reran sp_spaceused
:
name rows reserved data index_size unused
LogCount 85800181 7670848 KB 5255528 KB 2414496 KB 824 KB
As expected, the index size has decreased drastically, but the data size has increased by a gigabyte!
I then reran the drop/alter/create statement above but using int
(4 bytes) and got the following result:
name rows reserved data index_size unused
LogCount 85800181 7848032 KB 5255528 KB 2591688 KB 816 KB
Then I tried float(1)
(also 4 bytes):
name rows reserved data index_size unused
LogCount 85800181 7848016 KB 5255528 KB 2591672 KB 816 KB
Finally I went back to the original float(53)
:
name rows reserved data index_size unused
LogCount 85800181 10680584 KB 7726896 KB 2952464 KB 1224 KB
Compared to the original, the data size has increased by ~3.3 GB (almost doubled) while the index size has decreased by ~900MB.
A colleague suggested the culprit could be that MSSQL is allocating additional pages for the alter column
statement and not freeing them afterwards, so I also tried executing dbcc shrinkdatabase
after each step, but the results were the same.
So my questions:
- Why does altering a column from a larger to a smaller datatype, cause more data space to be used?
- Is
sp_spaceused
reliable? If not, what should I be using instead? If there isn't a better option, how do I determine if changing a column's datatype will have a positive or negative effect on disk space usage?
Best Answer
When changing the data type of a column, SQL Server will choose to either:
Even if every row must be changed, SQL Server still take steps (where possible) to prioritize speed over final size, on the basis that we want DDL changes to complete as quickly as possible. Optimizing the storage space can wait for a maintenance window.
Changing float to smallint can be accommodated within the existing space allocated for the row, but it does leave some unused space. As has been mentioned, this can be reclaimed by fully rebuilding the changed structure.