Sql-server – ny difference with specifying the primary key as an include column in a nonclustered index

bookmark-lookupindexnonclustered-indexsql serversql-server-2016

Don't nonclustered indexes inherently store a reference to the primary key on a table so that it can do a key lookup as needed?…if so, is it any less or more performant to specify the primary key as an included column when creating a nonclustered index?

Side question, why does the nonclustered index default to storing the primary key and not the clustered index fields to do a key lookup on the table?…In the cases where the primary key is not the clustered index, isn't it slower for the key lookup to occur whereas if it stored the clustered index it could do the lookup that way?

Best Answer

Nonclustered indexes contain a row locator back to the base table.

This is a clustered index key for rowstore tables with a clustered index or a physical RID (file/page/slot) for heaps.

So the parts in your question assuming that it uses the primary key (in the event that these are different) are moot.

The row locator is added to the key for non unique nonclustered indexes and as an included column to non clustered indexes declared as unique.

There is no performance advantage of specifying this explicitly for the included case. For the case that it is implicitly added to the key you may need to do this explicitly if your queries would benefit from a different asc/desc direction or multi column ordering than the implicit would give you.

You may prefer to be explicit about it for readability of your code for developers unfamiliar with the implicit behaviour or if your queries would still need those columns in the index even if the clustered index definition was to change in the future.

Regarding the followup question in the comments

Does that mean it's possible a key lookup might be performed from a nonclustered index for the primary key (when the primary key is not the clustered index) if that primary key is in the select list?

Yes, you can see this with the below.

CREATE TABLE #T
(
PK INT PRIMARY KEY NONCLUSTERED,
CI INT INDEX CIX CLUSTERED,
OtherCol INT INDEX NCIX NONCLUSTERED
)

SELECT *
FROM #T WITH (FORCESEEK)
WHERE OtherCol = 10

enter image description here

  1. The output list of the seek on NCIX is Uniq1001, CI, OtherCol
  2. The values of CI, Uniq1001 are then used by the lookup to seek into the clustered index...
  3. ... and the value for PK is returned from that lookup.

If you were wondering about Uniq1001 this is a part of the clustered index key silently added by the system to act as a uniqueifier as the clustered index was not declared as unique. It is empty except for rows where duplicate exist/existed for the value of CI.