Sql-server – Large difference in disk space used by partitions

etlpartitioningsql server

I have a partitioned table that contains about 155 million records total. It's partitioned by year and the partitions for both the current and previous years contain about 25 million records each.

However, the current year partition is using over 15 GB of disk and the previous year partition is only using 2.5 GB.

Each night, roughly 80 k records are loaded by SSIS into the current year partition. I also have a partition aligned index that I drop and rebuild before and after loading the table. I don't have too much practice with partitioned tables, so what might cause the big discrepancy in disk space?

[int] IDENTITY(1,1) NOT NULL,  
[varchar](10) NULL,  
[datetime] NULL,  
[datetime] NULL,  
[int] NULL,  
[int] NULL,  
[varchar](2) NULL,  
[varchar](20) NULL,  
[int] NULL,  
[int] NULL,  
[varchar](11) NULL,  
[int] NULL,  
[varchar](10) NULL,  
[int] NULL,  
[varchar](80) NULL,  
[varchar](10) NULL,  
[varchar](max) NULL,  
[varchar](10) NULL  

sys.dm_db_partition_stats

Best Answer

What might cause the big discrepancy in disk space?

Typically this is caused by rows that have been deleted. Heaps do not reclaim space for data pages emptied by DELETE operations unless a table lock is taken at the time of the deletion. Even then, other factors like the possibility of row-overflow data, or an enabled row-versioning isolation level can prevent space being reclaimed.

Another factor that usually has a much smaller impact on space usage is that rows with variable-length columns that expand may not fit on the original heap page, in which case the row is forwarded to another page. Forwarded rows are more usually associated with scan performance issues than pure space usage, however.

Overall, tables that experience significant deletions (without the space being reused quickly by new rows) and/or variable column updates that do not fit in place are often better configured with a clustered index. Clustered tables usually deallocate empty pages very quickly.

You can use the sys.dm_db_index_physical_stats DMV to see physical details for a heap or index:

SELECT
    DDIPS.index_id,
    DDIPS.partition_number,
    DDIPS.index_type_desc,
    DDIPS.alloc_unit_type_desc,
    DDIPS.avg_fragmentation_in_percent,
    DDIPS.fragment_count,
    DDIPS.avg_fragment_size_in_pages,
    DDIPS.page_count,
    DDIPS.avg_page_space_used_in_percent,
    DDIPS.record_count,
    DDIPS.avg_record_size_in_bytes,
    DDIPS.forwarded_record_count
FROM sys.dm_db_index_physical_stats
    (
        DB_ID(),
        OBJECT_ID(N'dbo.t1', N'U'),
        0,
        NULL,
        'DETAILED'
    ) AS DDIPS;

The following (trivial) example shows a heap DELETE not releasing any empty pages:

SET NOCOUNT ON;
CREATE TABLE t1 (c1 char(8000) DEFAULT 'a'); 
GO
INSERT INTO t1 DEFAULT VALUES; 
GO 1000

-- 1000 pages allocated
SELECT
    DDIPS.index_id,
    DDIPS.partition_number,
    DDIPS.page_count
FROM sys.dm_db_index_physical_stats
    (
        DB_ID(),
        OBJECT_ID(N'dbo.t1', N'U'),
        0,
        NULL,
        'DETAILED'
    ) AS DDIPS;

-- Delete all the data from the heap
DELETE FROM t1;

-- Still 1000 pages allocated
-- Even though the table is empty
SELECT
    DDIPS.index_id,
    DDIPS.partition_number,
    DDIPS.page_count
FROM sys.dm_db_index_physical_stats
    (
        DB_ID(),
        OBJECT_ID(N'dbo.t1', N'U'),
        0,
        NULL,
        'DETAILED'
    ) AS DDIPS;
GO
DROP TABLE dbo.t1;

If you try the example again, but with a table lock (DELETE FROM t1 WITH (TABLOCK)) the deletion frees all empty pages (assuming the database does not have the READ_COMMITTED_SNAPSHOT option set to ON etc.)