Sql-server – Effect of changing data types with row compression enabled

compressiondatatypessql server

The company I work for has a couple of SQL Server databases with tables containing +- 500.000.000 rows. We are running Enterprise editions of SQL Server 2008R2 and 2014.

Big data types

When I look at the used data types in the largest table I see a lot of BIGINT columns. Examining the data in those columns with a script from Thomas Larock and scripting the MIN() and MAX() values myself I concluded that data in those BIGINT columns can easily be fitted in INT or even SMALLINT/TINYINT columns. (I'm aware that some columns could need the range of BIGINT in the future,so I'm not blindly changing all the data types without talking to developers first)

When comparing the possible savings when changing the datatypes it seems the table could be half the size it is currently (not even considering indexes and other tables). These numbers are without any data compression.

ROW Compression

On the large table ROW Compression is enabled. I'm wondering what the actual impact of 'downsizing' the columns datatypes might be, keeping in mind that ROW compression is using only the bytes that are needed. For example, if a value can be stored in 1 byte, storage will take only 1 byte.

Actual question

Would it help to downsize data types, so the ROW compression uses less resources? Or is it save to say 'because ROW compression is enabled there is no difference between BIGINT, INT or SMALLINT data types'?

Best Answer

As the documentation you alredy have linked states, ROW compression uses only the bytes that are needed. Once ROW compression is in use, the CPU cycles used to convert from/to int or bigint are the same: I would not worry about that.

BTW, if you're not sure whether int/bigint has an impact on the database size or not (it hasn't), you can see for yourself with a quick and dirty repro:

USE tempdb;
GO

CREATE TABLE SomeTable (
    SomeColumn bigint
)
GO

ALTER TABLE [dbo].[SomeTable] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = NONE);

INSERT INTO SomeTable 
SELECT TOP 10000000 ROW_NUMBER() OVER (ORDER BY(SELECT NULL))
FROM sys.all_columns AS A1
CROSS JOIN sys.all_columns AS A2;
GO

-- Rebuild the heap, so that pages compact nicely
ALTER TABLE dbo.SomeTable 
REBUILD 
WITH 
(
    MAXDOP = 1, 
    ONLINE = OFF,
    FILLFACTOR = 100,
    PAD_INDEX = OFF
);
GO

SELECT SUM(page_count)
FROM sys.dm_db_index_physical_stats(
        DB_ID(),
        OBJECT_ID('SomeTable'),
        DEFAULT,
        DEFAULT,
        'detailed'
    ) AS ips;


-- 21009 pages used


ALTER TABLE [dbo].[SomeTable] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW);
GO


SELECT SUM(page_count)
FROM sys.dm_db_index_physical_stats(
        DB_ID(),
        OBJECT_ID('SomeTable'),
        DEFAULT,
        DEFAULT,
        'detailed'
    ) AS ips;

-- 13587 pages used


ALTER TABLE SomeTable ALTER COLUMN SomeColumn int;

-- Rebuild the heap, so that pages compact nicely
ALTER TABLE dbo.SomeTable 
REBUILD 
WITH 
(
    MAXDOP = 1, 
    ONLINE = OFF,
    FILLFACTOR = 100,
    PAD_INDEX = OFF
);
GO


SELECT SUM(page_count)
FROM sys.dm_db_index_physical_stats(
        DB_ID(),
        OBJECT_ID('SomeTable'),
        DEFAULT,
        DEFAULT,
        'detailed'
    ) AS ips;

-- 13587 pages used (same as bigint)