Nonclustered indexes always include a row locator.
For a heap this will be an 8 byte RID (File:Page:Slot). On a table with a clustered index it will be the clustered index key column(s). And it will always be the copied values not a pointer to the values. This duplication of CI key values into all non clustered indexes is why it is often recommended that the CI key be narrow and not frequently updated.
In the table shown in the question the Clustered index key is a 4 byte integer and potentially may also include a 4 byte uniqueifier for any duplicate key values.
In your case as the NCIs are not declared as unique the CI key will be appended to the NCI key.
For unique non clustered indexes the CI key would be added as included column(s) in the leaf pages unless explicitly made part of the key.
See Kalen Delaney: More About Nonclustered Index Keys for some additional information about how you can see this for yourself.
With these 4 rows of data all three indexes only consume a single 8KB data page.
SELECT index_id,
index_level,
page_count,
record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('people'), NULL, NULL, 'DETAILED')
Returns
+----------+-------------+------------+--------------+
| index_id | index_level | page_count | record_count |
+----------+-------------+------------+--------------+
| 1 | 0 | 1 | 4 |
| 2 | 0 | 1 | 4 |
| 3 | 0 | 1 | 4 |
+----------+-------------+------------+--------------+
The additional page shown in use by sys.allocation_units.total_pages
is an IAM page. This is not used for storing data but just for tracking the pages and extents comprising the index.
I think it's safe to remove the non-primary key as it's an exact duplicate of the primary key.
If one of them is used more and takes less space, did you check the fragmentation and last statistics date on both? Maybe one of them is more fragmented or has better stats.
Best Answer
Before addressing the two questions, let me briefly define the difference between clustered & non-clustered indexes.
The clustered index is organized by the key columns. It also includes every other column as part of the row structure (ie, it has the entire row).
The nonclustered index is also organized by the key columns. It implicitly includes the clustering key columns (if the table is clustered), or a pointer to the row (if the table's a heap). If any
INCLUDE
columns are explicitly specified, they will also be included in the index structure.There is a corner case where it makes sense to have a non-clustered index "duplicating" the clustered index. If you have a query that frequently scans the table, and ONLY makes use of the clustering key column, the query optimizer will prefer to use the non-clustered index. The non-clustered index does not contain the full row data, and thus it will take up less physical space. Because it takes up less physical space, SQL Server can scan the table with fewer IOs, and will make use of it for performance reasons.
You can query
sys.dm_db_index_usage_stats
, you can see the number ofuser_seeks
anduser_scans
on the two indexes. This will help you see how the two indexes are being used, and determine the usefulness of the two indexes.As a general rule, most tables should have a clustered index. (That's a whole other topic). Not knowing any details about your index usage or data access, I'd guess that if you were to drop one, you would drop the nonclustered index (but, that really is a guess based on what I know).
Depending on the clustering key, the data, the data access patterns, etc., the most correct answer might be to drop the clustered index and create a different clustered index. You may want to read more on Effective Clustered Indexes.