I have a lot of large tables (around 10 million wide rows) which need to be regularly loaded into SQL Server 2016 for read-only reporting. I would like these tables to be as small as possible on disk, and this matters more than performance improvements in either loading or querying.
Here is what I have been doing for the tables which require no further indexing:
- Create the table with
DATA_COMPRESSION=PAGE
. - Use bcp to bulk insert the data from a flat file into the new table.
Column types in the tables are varchar (never more than 512, not max), float, tinyint, or date (not datetime). All columns are created as nullable and no primary or foreign keys are defined — they don't matter for the querying and the tables are never updated directly. Default collation on everything is SQL_Latin1_General_CP1_CI_AS
.
When I do this, I can see in sys.allocation_units
that page data compression has been applied to the heap and I can see in sys.partitions
that the fill factor is correctly 0 (100%). Since the tables are much smaller than uncompressed tables would be, I thought the compression was accomplished.
However, if I then rebuild with the same option DATA_COMPRESSION=PAGE
, the supposedly-already-compressed table gets about 30% smaller! It looks like it's going from about 17 rows per data page to 25 rows per page. (Only once, though. Rebuilding again after that doesn't make it any smaller than the first rebuild did.)
The questions
So my questions are: (a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?
Best Answer
@HandyD is entirely correct, I only want to highlight some other methods to get compression while inserting into a heap.
From the same document
According to this, you could leverage minimally logged bulk inserts or use
INSERT INTO ... WITH (TABLOCK)
to getPAGE
compression without having to do rebuilds.There are rules to get
PAGE
compression when inserting into a heap, add-h "TABLOCK"
to yourbcp
command to get compression.ROW
compression works without these prerequisites and is the least amount of compression used in below examples, thanks @DenisRubashkin for pointing that out!Testing
Example start data & BCP out command
The
ROW
compressed and Uncompressed sizeThe data size is at
132272 KB
when doing a standard insert into the heap, this isROW
compressed but notPAGE
compressed.The data size without any compression is ~
176216 KB
for our test.INSERT INTO ... WITH TABLOCK
Inserting
WITH TABLOCK
gives us thePAGE
compressed data size,69480 KB
.BULK INSERT
Now when we create a destination heap table that is also
page
compressed, and do a bulk insertwith tablock
:The data gets
page
compressed and is also at69480 KB
:BCP IN WITH TABLOCK
You can get the same results as the
BULK INSERT WITH TABLOCK
by usingBCP IN
with the-h "TABLOCK"
hint. This makes sense, they do the same internallyWith the resulting size being
69480 KB
BCP IN WITHOUT TABLOCK
Using BCP to load data from the same file in a copy of the destination table
And a standard bcp command results into non compressed data:
With the data size at
132272 KB
(row compressed).