Sql-server – bulk insert into an empty page-compressed table and get full compression

bcpcompressionsql serversql-server-2016

I have a lot of large tables (around 10 million wide rows) which need to be regularly loaded into SQL Server 2016 for read-only reporting. I would like these tables to be as small as possible on disk, and this matters more than performance improvements in either loading or querying.

Here is what I have been doing for the tables which require no further indexing:

  1. Create the table with DATA_COMPRESSION=PAGE.
  2. Use bcp to bulk insert the data from a flat file into the new table.

Column types in the tables are varchar (never more than 512, not max), float, tinyint, or date (not datetime). All columns are created as nullable and no primary or foreign keys are defined — they don't matter for the querying and the tables are never updated directly. Default collation on everything is SQL_Latin1_General_CP1_CI_AS.

When I do this, I can see in sys.allocation_units that page data compression has been applied to the heap and I can see in sys.partitions that the fill factor is correctly 0 (100%). Since the tables are much smaller than uncompressed tables would be, I thought the compression was accomplished.

However, if I then rebuild with the same option DATA_COMPRESSION=PAGE, the supposedly-already-compressed table gets about 30% smaller! It looks like it's going from about 17 rows per data page to 25 rows per page. (Only once, though. Rebuilding again after that doesn't make it any smaller than the first rebuild did.)

The questions

So my questions are: (a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?

Best Answer

@HandyD is entirely correct, I only want to highlight some other methods to get compression while inserting into a heap.

From the same document

When a heap is configured for page-level compression, pages receive page-level compression only in the following ways:

  • Data is bulk imported with bulk optimizations enabled.
  • Data is inserted using INSERT INTO ... WITH (TABLOCK) syntax and the table does not have a nonclustered index.
  • A table is rebuilt by executing the ALTER TABLE ... REBUILD statement with the PAGE compression option.

According to this, you could leverage minimally logged bulk inserts or use INSERT INTO ... WITH (TABLOCK) to get PAGE compression without having to do rebuilds.


(a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?

There are rules to get PAGE compression when inserting into a heap, add -h "TABLOCK" to your bcp command to get compression.

ROW compression works without these prerequisites and is the least amount of compression used in below examples, thanks @DenisRubashkin for pointing that out!


Testing

Example start data & BCP out command


--Tested on SQL Server 2014 SP2

CREATE TABLE dbo.CompressedHeap_Source( Val varchar(512), 
                                 Datefield Date, 
                                 Tinyfield TinyINT,
                                 Floatfield float) 
WITH (DATA_COMPRESSION = PAGE);

INSERT INTO dbo.CompressedHeap_Source
(
Val,Datefield,Tinyfield,Floatfield)

SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;

--bcp TEST.dbo.CompressedHeap_Source out E:\Data\HeapData.bcp -c -T

The ROW compressed and Uncompressed size

The data size is at 132272 KB when doing a standard insert into the heap, this is ROW compressed but not PAGE compressed.

The data size without any compression is ~ 176216 KB for our test.

exec sp_spaceused 'dbo.CompressedHeap_Source'

name                    rows                    reserved    data      index_size    unused
CompressedHeap_Source   6365530                 132296 KB   132272 KB   8 KB    16 KB

INSERT INTO ... WITH TABLOCK

Inserting WITH TABLOCK gives us the PAGE compressed data size, 69480 KB.

INSERT INTO dbo.CompressedHeap_Source2  WITH(TABLOCK)
(
Val,Datefield,Tinyfield,Floatfield)

SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2

BULK INSERT

Now when we create a destination heap table that is also page compressed, and do a bulk insert with tablock:

CREATE TABLE dbo.CompressedHeap_Destination( Val varchar(512), 
                                 Datefield Date, 
                                 Tinyfield TinyINT,
                                 Floatfield float) 
WITH (DATA_COMPRESSION = PAGE);

bulk insert dbo.CompressedHeap_Destination

from 'E:\Data\HeapData.bcp'  with (TABLOCK)

The data gets page compressed and is also at 69480 KB:

name    rows    reserved    data    index_size  unused
CompressedHeap_Destination  6365530                 69512 KB    69480 KB    8 KB    24 KB

BCP IN WITH TABLOCK

You can get the same results as the BULK INSERT WITH TABLOCK by using BCP IN with the -h "TABLOCK" hint. This makes sense, they do the same internally

--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T -h "TABLOCK"

With the resulting size being 69480 KB

BCP IN WITHOUT TABLOCK

Using BCP to load data from the same file in a copy of the destination table

And a standard bcp command results into non compressed data:

--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T 

With the data size at 132272 KB (row compressed).