PAD_INDEX
only applies to non leave level pages. By default those are filled full.
So specifying PAD_INDEX=ON
together with FILLFACTOR
is only useful when you specify a fillfactor smaller then 100%.
In your case, specifying PAD_INDEX
is not necessary, since by default your leave pages are how you want them. Full.
I think the problem here is a difference in terminology.
The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.
The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT
statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.
The one thing I would pick at from the article quote is this (emphasis mine):
One write for inserting the row, and one write for updating the non-clustered index.
This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.
So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?
Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."
Best Answer
While I don't know the specific internal mechanism(s) that are responsible for the differences, I can say that Heaps are managed (internally) slightly differently than Clustered Indexes (and possibly also Nonclustered Indexes) :
Deleting rows from a Heap such that one or more data pages are empty (no allocated rows) does not necessarily free up that space. You will likely need to either create, and then drop, a Clustered Index on the table, or call
ALTER TABLE [TableName] REBUILD;
(as of SQL Server 2014 ?). Please see the Microsoft Docs page for DELETE for more details and options.Inserting individual rows (i.e. not a set-based
INSERT
) into a Heap does not fill the data pages as fully as it does with Clustered Indexes. Clustered Indexes will fit rows as long as there is space for the row (data and row-overhead) plus the 2-byte overhead of the slot array. Data pages in Heaps, however, don't use the number of bytes left on the page, but instead use a very generalized indicator of how full the page is, and there aren't that many levels that are reported. The levels are something along the lines of: 0%, 20%, 50%, 80%, and 100% full. And it will switch over to 100% while there is still space for another row (and in fact, had those same number of rows been inserted in a set-based operation, then it would have filled the page as much as possible). Of course, just like with theDELETE
operations, rebuilding the Heap will pack as many rows as will fit onto the data page.Now consider the following information, taken from the "When Page Compression Occurs" section of the Microsoft Docs page for Page Compression Implementation:
Hence, it seems entirely inline with this other Heap behavior that they would require an ALTER TABLE REBUILD, CREATE / DROP of a Clustered Index, or a change in the Data Compression setting (all of which rebuild the heap) before the data pages are written optimally. If Heaps are not fully aware of "whole pages" (until the Heap is rebuilt) and don't know when the page is definitely full, then they wouldn't know when to initiate the page compression operation (when dealing with updates and single-row inserts).
Another technicality that would further limit some Heaps from auto-applying Page Compression (even if they otherwise could) is that applying the compression would require all Nonclustered Indexes for that Heap (if any exist) to be rebuilt. As that linked page for "Data Compression" also states:
The "pointers" being referred to are the Row IDs (RIDs), which are a combination of: FileID, PageID, and slot/position on the page. These RIDs are copied into Nonclustered Indexes. Being a precise physical location, they are sometimes faster lookups than traversing a b-tree with the Clustered Index keys. But, one drawback of a physical location is that it can change, and that is the issue here. Clustered Indexes, however, do not suffer from this problem because their Key values are copied into Nonclustered Indexes as the pointer back to the Clustered Index. And Key values remain the same, even when their physical location changes.
Also see:
the "Managing Heaps" section of the Microsoft Docs page for Heaps (Tables without Clustered Indexes):
the "Considerations for When You Use Row and Page Compression" section of the Microsoft Docs page for Data Compression:
And the statement quoted in the question.