Sql-server – Aren’t two writes required to update a clustered index record

I was reading the article on indexes at simple-talk, where it is written that

If a heap has a non-clustered index on it (as the primary key), and data is inserted into the table, two writes have to occur. One write for inserting the row, and one write for updating the non-clustered index. On the other hand, if a table has a clustered index as the primary key, inserts take only one write, not two writes. This is because a clustered index, and its data, are one in the same. Because of this, it is faster to insert rows into a table with a clustered index as the primary key than it is to insert the same data into a heap that has a non-clustered index as its primary key. This is true whether or not the primary key is monotonically increasing or not.

Isn't it wrong? Clustered indexed naturally resort to B+ trees, where only the keys are stored in intermediate nodes (unlike B-trees, where the entire record is stored. That is why B+ trees can fit more keys in a single page, resulting it being bulky in width but shorter in height), so all records are stored in the leaf pages (the pages themselves being logically sorted through linked-lists, while the data in each page being physically sorted). So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?

enter image description here
Update: Okay, I did some study, and found that apart from the initial tree structure (where some values must be present twice, for example the key values in the node), when new values are inserted, they just fit into the leaf page, whereas the tree gets restructured to accommodate that. However, when, say 5 values are inserted, the 3rd value may cause the first inserted value (which is currently occupying only leaf level space) to cascade up, thus causing it to be written twice (once in the leaf level, other in the index level). Of course, such re-writes (although they don't occur at the time of insertion, they can occur later) will be much less compared to the two-time write which occurs each time there is an insertion into a heap with NCI, but still isn't it wrong to say that there is no re-write?

Best Answer

I think the problem here is a difference in terminology.

The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.

The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.

The one thing I would pick at from the article quote is this (emphasis mine):

One write for inserting the row, and one write for updating the non-clustered index.

This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.

So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?

Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."

Best Answer

Related Solutions

Sql-server – Need for reaching data through clustered index with a non-clustered index

Sql-server – Finding exact row in Clustered index leaf page

Related Question