Sql-server – Need for reaching data through clustered index with a non-clustered index

clustered-indexnonclustered-indexsql server

I have found that when a table has both clustered and non-clustered indexes (on different columns), the leaf level non-clustered pages, instead of pointing to the data row, point to the node of the clustered index, from where another search is instituted to find the data row. What is the point of this extra level of indirection? If the clustered index has, say, 8 levels, then the indirection from the NCI leaf page to the CI root would have to traverse through these 8 levels to reach the data. Why not store the normal RID in the NCI leaf page so that we can access the data at once without going through the CI index structure?

Best Answer

The reason for this is that the "fixed" physical location of your row - the RID (or row identifier) might (and will!) change over time - think page splits that occur when a row needs to be inserted into a table on a page that's already full.

Updating those RIDs in all the nonclustered indices that exist on a given table is quickly becoming both a hassle, and a huge performance killer. You might have 5, 10, 20 nonclustered indices on your table, and SQL Server would have to scan all those indices (basically scanning the whole index, all rows in the index, and that 10, 20 times) and update all RIDs.... that's just not practical - very quickly so.

If you store the value of the clustering index as a "row pointer" instead, that value should typically never change - and most definitely it doesn't need to be updated every time a page is split. Yes, it does involve a second index seek operation - the key lookup - but for simple scenarios, retrieving a single row or a few rows, that's still much more efficient than anything else.