Sql-server – Will a nonclustered index with a unique column always address all queries filtering that column first

indexindex-tuningnonclustered-indexsql serversql-server-2008-r2

I have found large heap tables that are over-indexed and for example have multiple different nonclustered indexes with different columns, but some of those indexes have the (only one present) primary key column set as their first index column.

I made some investigation, played around with indexes and queries and made my conclusions that I would like to see confirmed by professionals to be sure.

Question 1: Am I right to say it is nonsense to have multiple indexes based on the primary key instead of only ONE index based on the primary key column as the first and ONLY index column, because all queries using this pk for filtering or joining return only ONE row? (Assuming additional filtered, joined or selected columns are put into the index as included columns to entirely cover the queries and avoid RID lookups)

Question 2: Is it right to say that nonclustered indexes based on the primary key of a table should always be created as UNIQUE indexes? (a setting that is optional for whatever reason since sql server should already know that an index including the pk will be unique…)

Best Answer

  1. Yes, given the constraints in the question, particularly that the primary key column is the leading column in the indexes. Also assuming the primary key never changes.

  2. Not necessarily.

    The optimizer can indeed infer uniqueness without marking the nonclustered index unique.

    Marking the index unique may introduce a Split-Sort-Collapse combination in execution plans that change an index key. The extra Sort in particular has the potential to be performance-affecting.

    On the other hand, not marking the index unique risks data integrity if the primary key is ever changed.

Example

CREATE TABLE dbo.Test
(
    pk integer PRIMARY KEY NONCLUSTERED,
    c1 integer NOT NULL,
    c2 integer NOT NULL
);

-- Not unique on pk, c1
CREATE NONCLUSTERED INDEX ic1 ON dbo.Test (pk, c1);

-- Unique on pk, c2
CREATE UNIQUE NONCLUSTERED INDEX ic2 ON dbo.Test (pk, c2);

Uniqueness

-- Neither plan has an aggregate
SELECT DISTINCT T.c1 FROM dbo.Test AS T WHERE T.pk = 1;
SELECT DISTINCT T.c2 FROM dbo.Test AS T WHERE T.pk = 1;

No aggregates

Split, Sort, Collapse

-- No split, sort, collapse
UPDATE dbo.Test SET c1 = CHECKSUM(NEWID());

Simple update

-- Split, sort, collapse updating unique key
UPDATE dbo.Test SET c2 = CHECKSUM(NEWID());

SSC plan

Note the split-sort-collapse plan is also a wide (per-index) update.

Uniqueness is a huge topic though. I would normally mark something that is unique as unique, unless there is a good reason not to. Some further reading from my blog:


To anticipate comments about heap tables: Most tables benefit from being clustered. You need good reasons to choose a heap structure, especially from a space management point of view, if the table ever experiences deletes. Updates can also introduce performance impacts if columns expand beyond the space available on the original page (forwarded records).