Sql-server – Question about non-clustered index storage in SQL Server

indexsql serverstorage-engine

I have a table people with two non-clustered indexes.

create table people(
 id_person int,
 first_name varchar(50),
 last_name varchar(50),
 city varchar(100),
 state char(2),
 zip_code int
)

CREATE INDEX id_first_name_last_name ON people(first_name, last_name, id_person)
CREATE INDEX id_last_name_first_name ON people(id_person, last_name, first_name)

Each index refers to the same columns but in a different order.

— Insert some rows

insert into people values(1,'joe','smith','new york', 'NY', 10701)
insert into people values(2,'john','smith','new york', 'NY', 10701)
insert into people values(3,'joyce','smith','new york', 'NY', 10701)
insert into people values(4,'jocelyn','smith','new york', 'NY', 10701)

The size of each index is as follows

TableName   IndexName            IndexID    Indexsize(KB)
people          NULL                     0           16
people          id_first_name_last_name  2           16
people          id_last_name_first_name  3           16

Each index is using 16KB.

Now if I add a clustered index on id_person

create clustered index ix_id_people on people(id_person)

I end up with the same amount of space being used by each index.

TableName   IndexName            IndexID    Indexsize(KB)
people          NULL                     1           16
people          id_first_name_last_name  2           16
people          id_last_name_first_name  3           16

The only difference is now instead of a HEAP, I have a clustered index.

I was expecting that with a clustered index on id_person, the storage engine wouldn't need to include id_person anymore in each non-clustered index.

Questions:

Are the values for each column referenced by a non-clustered index stored only for the first non-clustered index and then referenced by all additional non-clustered indexes?

If a non-clustered index references the clustered index column, are the values from that column copied physically onto each non-clustered index or just simply referenced?

Update 14:55

Here is the query being used to calculate index size

SELECT
OBJECT_NAME(i.OBJECT_ID) AS TableName,
i.name AS IndexName,
i.index_id AS IndexID,
8 * SUM(a.used_pages) AS 'Indexsize(KB)'
FROM sys.indexes AS i
JOIN sys.partitions AS p ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id
JOIN sys.allocation_units AS a ON a.container_id = p.partition_id
GROUP BY i.OBJECT_ID,i.index_id,i.name
ORDER BY OBJECT_NAME(i.OBJECT_ID),i.index_id

Best Answer

Nonclustered indexes always include a row locator.

For a heap this will be an 8 byte RID (File:Page:Slot). On a table with a clustered index it will be the clustered index key column(s). And it will always be the copied values not a pointer to the values. This duplication of CI key values into all non clustered indexes is why it is often recommended that the CI key be narrow and not frequently updated.

In the table shown in the question the Clustered index key is a 4 byte integer and potentially may also include a 4 byte uniqueifier for any duplicate key values.

In your case as the NCIs are not declared as unique the CI key will be appended to the NCI key.

For unique non clustered indexes the CI key would be added as included column(s) in the leaf pages unless explicitly made part of the key.

See Kalen Delaney: More About Nonclustered Index Keys for some additional information about how you can see this for yourself.

With these 4 rows of data all three indexes only consume a single 8KB data page.

SELECT index_id,
       index_level,
       page_count,
       record_count
FROM   sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('people'), NULL, NULL, 'DETAILED') 

Returns

+----------+-------------+------------+--------------+
| index_id | index_level | page_count | record_count |
+----------+-------------+------------+--------------+
|        1 |           0 |          1 |            4 |
|        2 |           0 |          1 |            4 |
|        3 |           0 |          1 |            4 |
+----------+-------------+------------+--------------+

The additional page shown in use by sys.allocation_units.total_pages is an IAM page. This is not used for storing data but just for tracking the pages and extents comprising the index.