Sql-server – Moving data from table with VARCHAR(50) fields to table with numeric fields increases table size

datatypesdisk-spaceoptimizationsql-server-2012

I want to understand why migrating data in a table with all VARCHAR(50) fields to a table with optimized smaller types caused the new table (containing 61,065,164 rows) to be 4.46 GB, which is larger than the original table which is 3.1 GB. I expected the new table to be smaller, not larger.

  1. table1 structure to optimize which is created by usual Import wizard

    CREATE TABLE [dbo].table1(
    [dw_date_key] [varchar](50) NULL,
    [dw_OBFID] [varchar](50) NULL,
    [OBFID] [varchar](50) NULL,
    [account_link_code] [varchar](50) NULL,
    [dormant_0_6_total] [varchar](50) NULL,
    [dormant_7_13_total] [varchar](50) NULL,
    [dormant_14_20_total] [varchar](50) NULL,
    [dormant_21_28_total] [varchar](50) NULL,
    [dormant_0_15_total] [varchar](50) NULL,
    [dormant_16_30_total] [varchar](50) NULL,
    [dormant_0_30_total] [varchar](50) NULL,
    [dormant_31_60_total] [varchar](50) NULL,
    [dormant_61_90_total] [varchar](50) NULL,
    [val_total] [varchar](50) NULL,
    [return_dormancy] [varchar](50) NULL,
    [new_val_total] [varchar](50) NULL,
    [gross_adds] [varchar](50) NULL,
    [platform_movement] [varchar](50) NULL,
    [keep_my_no] [varchar](50) NULL,
    [sdp_snap_ma] [varchar](50) NULL,
    [contract_type] [varchar](50) NULL,
    [dormant_days] [varchar](50) NULL,
    [registration_date] [varchar](50) NULL,
    [activation_date] [varchar](50) NULL,
    [last_activity_date] [varchar](50) NULL,
    [last_platform_movement_date] [varchar](50) NULL,
    [create_dt] [varchar](50) NULL,
    [batch_id] [varchar](50) NULL,
    [val_returne] [varchar](50) NULL
    ) ON [PRIMARY]
    
  2. table2 which mostly benefits from 18 bit columns instead of 18 varchar columns

    CREATE TABLE dbo.table2
    (
    dw_date_key int not NULL,
    dw_OBFID bigint not NULL,
    OBFID bigint not NULL,
    account_link_code varchar(50) not NULL,
    dormant_0_6_total bit not NULL,
    dormant_7_13_total bit not NULL,
    dormant_14_20_total bit not  NULL,
    dormant_21_28_total bit not NULL,
    dormant_0_15_total bit not NULL,
    dormant_16_30_total bit not NULL,
    dormant_0_30_total bit not NULL,
    dormant_31_60_total bit not NULL,
    dormant_61_90_total bit not NULL,
    val_total bit not NULL,
    return_dormancy bit not NULL,
    new_val_total bit not  NULL,
    gross_adds bit not NULL,
    platform_movement bit not NULL,
    keep_my_no bit not NULL,
    sdp_snap_ma bit not NULL,
    contract_type varchar(1) not NULL,
    dormant_days smallint not NULL,
    registration_date int not NULL,
    activation_date int not NULL,
    last_activity_date int not NULL,
    last_platform_movement_date int not NULL,
    create_dt datetime not NULL,
    batch_id int not NULL,
    val_returne bit not NULL
    )  ON [PRIMARY]
    GO
    
  3. Sample data for all columns:

    20141110,989303710761,9303710761,8068176470,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,P,396,20120806,20120806,20131010,20131010,2014/11/13,18:41:48,470816,0

Notes:

  • both tables have no index and sizes are based on SSMS storage Data Space in table properties and index size is very small (Index Space = 0.023 MB)
  • SQL 2012 (11.0.5058.0) and recovery model are bulk-logged.
  • I have heard that bit type actually get benefits when you having large number of them in a table because each 8 one of them occupy 1 byte theatrically.
  • I run a datalength() for all my columns and here are the 2 result for 2 tables

    8,12,10,10,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,8,8,8,8,19,6,1

    table1 sample row sum = 118 bytes

    4,08,08,08,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,4,4,4,4,08,4,1

    table2 sample row sum = 76 bytes (even without considering bit space compensation which is displaced all with 1 byte here)

Main question:

If a sum up the data length result it says that I have reduced 118 bytes of each row to 76 for each row on average. That means 35% less.

But why I am not close to this number and the new table structure takes up more space?

My final goal is to automate the current manual ETL process using SSIS and optimize types to reduce the database size which is currently 400GB and will get larger weekly and also better query performance and better indexing instead of simple string types.

Any help is appreciated.

update

  • rebuild does not have any effect since there is no unused space
  • making all the fields as not null was possible and does not have any effect
  • output of the sp_spaceused for the main table with all varchar(50)

    rows reserved data index_size unused
    61065164 3251096 KB 3250872 KB 24 KB 200 KB

  • this new table is all not null and i updated and is rebuild after insert into command and i updated the main table script with not null

    • rows reserved data index_size unused
    • 61065164 4925216 KB 4924872 KB 32 KB 312 KB
  • i tried SELECT COUNT(IIF(RTRIM(dw_date_key )= '', 1, null)) as suggested by @srutzky for all the columns in source table but there are no empty string or null except for the 2 columns of account_link_code and contract_type which are also varchar in final table

Best Answer

Interesting question, I created both table1 and table2 in SQL Server 2014 and populated 2000 rows of the same data you have provided. When checking the amount of space consumed by each table, here is what I got:

table1 (varchar) - 392K
table2 (bit) - 168K

However when I alter all the column types in table1 to match table2, the space used grew to 540K. After rebuilding the table, the space drops down to 160K:

alter table table1 rebuild;

So you should see the disk space savings using the leaner data types after reclaiming the used space. Best of luck.