Sql-server – Compression on a heap

compressionheapsql server

The follow is a paragraph from Microsoft Docs:

New pages allocated in a heap as part of DML operations will not use PAGE compression until the heap is rebuilt. Rebuild the heap by removing and reapplying compression, or by creating and removing a clustered index.

I can't figure out why this is the case. If I have a heap with a specified compression setting, why wouldn't it be applied to a page that belongs to the table?

Thanks

Best Answer

While I don't know the specific internal mechanism(s) that are responsible for the differences, I can say that Heaps are managed (internally) slightly differently than Clustered Indexes (and possibly also Nonclustered Indexes) :

Deleting rows from a Heap such that one or more data pages are empty (no allocated rows) does not necessarily free up that space. You will likely need to either create, and then drop, a Clustered Index on the table, or call ALTER TABLE [TableName] REBUILD; (as of SQL Server 2014 ?). Please see the Microsoft Docs page for DELETE for more details and options.
Inserting individual rows (i.e. not a set-based INSERT) into a Heap does not fill the data pages as fully as it does with Clustered Indexes. Clustered Indexes will fit rows as long as there is space for the row (data and row-overhead) plus the 2-byte overhead of the slot array. Data pages in Heaps, however, don't use the number of bytes left on the page, but instead use a very generalized indicator of how full the page is, and there aren't that many levels that are reported. The levels are something along the lines of: 0%, 20%, 50%, 80%, and 100% full. And it will switch over to 100% while there is still space for another row (and in fact, had those same number of rows been inserted in a set-based operation, then it would have filled the page as much as possible). Of course, just like with the DELETE operations, rebuilding the Heap will pack as many rows as will fit onto the data page.

Now consider the following information, taken from the "When Page Compression Occurs" section of the Microsoft Docs page for Page Compression Implementation:

... As data is added to the first data page, data is row-compressed. ... When the page is full, the next row to be added initiates the page compression operation. The whole page is reviewed; ...

Hence, it seems entirely inline with this other Heap behavior that they would require an ALTER TABLE REBUILD, CREATE / DROP of a Clustered Index, or a change in the Data Compression setting (all of which rebuild the heap) before the data pages are written optimally. If Heaps are not fully aware of "whole pages" (until the Heap is rebuilt) and don't know when the page is definitely full, then they wouldn't know when to initiate the page compression operation (when dealing with updates and single-row inserts).

Another technicality that would further limit some Heaps from auto-applying Page Compression (even if they otherwise could) is that applying the compression would require all Nonclustered Indexes for that Heap (if any exist) to be rebuilt. As that linked page for "Data Compression" also states:

Changing the compression setting of a heap requires all nonclustered indexes on the table to be rebuilt so that they have pointers to the new row locations in the heap.

The "pointers" being referred to are the Row IDs (RIDs), which are a combination of: FileID, PageID, and slot/position on the page. These RIDs are copied into Nonclustered Indexes. Being a precise physical location, they are sometimes faster lookups than traversing a b-tree with the Clustered Index keys. But, one drawback of a physical location is that it can change, and that is the issue here. Clustered Indexes, however, do not suffer from this problem because their Key values are copied into Nonclustered Indexes as the pointer back to the Clustered Index. And Key values remain the same, even when their physical location changes.

Also see:

the "Managing Heaps" section of the Microsoft Docs page for Heaps (Tables without Clustered Indexes):

To rebuild a heap to reclaim wasted space, create a clustered index on the heap, and then drop that clustered index.
the "Considerations for When You Use Row and Page Compression" section of the Microsoft Docs page for Data Compression:
When a heap is configured for page-level compression, pages receive page-level compression only in the following ways:
- Data is bulk imported with bulk optimizations enabled.
- Data is inserted using INSERT INTO ... WITH (TABLOCK) syntax and the table does not have a nonclustered index.
- A table is rebuilt by executing the ALTER TABLE ... REBUILD statement with the PAGE compression option.
And the statement quoted in the question.

Best Answer

Related Solutions

Sql-server – PAD_INDEX and FILLFACTOR on clustered Identity index

Sql-server – Aren’t two writes required to update a clustered index record

Related Question