Sql-server – Should I mark a composite index as unique if it contains the primary key

indexsql server

Given some table with a primary key, e.g.:

CREATE TABLE Customers (
   CustomerID int NOT NULL PRIMARY KEY,
   FirstName nvarchar(50),
   LastName nvarchar(50),
   Address nvarchar(200),
   Email nvarchar(260)
   --...
)

we have a unique primary key on CustomerID.

Traditionally i might then need some additional covering indexes; for example to quickly find a user by either CustomerID or Email:

CREATE INDEX IX_Customers_CustomerIDEmail ON Customers
(
   CustomerID,
   Email
)

And these are the kinds of indexes i've created for decades.

It's not required to be unique, but it actually is

The index itself exists to avoid a table scan; it is a covering index in order to aid performance (the index is not there as a constraint to enforce uniqueness).

Today i remembered a tid-bit of information – SQL Server can use the fact that:

  • a column has a foreign key constraint
  • a column has a unique index
  • a constraint is trusted

in order to help it optimize its query execution. In fact, from SQL Server Index Design Guide:

If the data is unique and you want uniqueness enforced, creating a unique index instead of a nonunique index on the same combination of columns provides additional information for the query optimizer that can produce more efficient execution plans. Creating a unique index (preferably by creating a UNIQUE constraint) is recommended in this case.

Given that my multi-column index contains the primary key, this composite index will de facto be unique. It's not a constraint that i particularly need SQL Server to enforce during every insert or update; but the fact is that this non-clustered index is unique.

Is there any advantage in marking this de facto unique index as actually unique?

CREATE UNIQUE INDEX IX_Customers_CustomerIDEmail ON Customers
(
   CustomerID,
   Email
)

It seems to me that SQL Server could be smart enough to realize that my index already is unique by virtue of the fact that it contains the primary key.

  • But perhaps it doesn't know this, and there's an advantage for the optimizer if i declare the index as unique anyway.
  • Except perhaps that might now lead to slowdowns during inserts and updates, where it must perform uniqueness checks – where before it never had to before.
  • Unless it knows the index is guaranteed to already be unique, because it contains the primary key.

I cannot find any guidance from Microsoft about what to do when a composite index contains the primary key.

The benefits of unique indexes include the following:

  • Data integrity of the defined columns is ensured.
  • Additional information helpful to the query optimizer is provided.

Should i mark a composite index as unique if it already contains the primary key? Or can SQL Server figure out this for itself?

Best Answer

Should I mark a composite index as unique if it already contains the primary key?

Probably not. The optimizer can generally use information about the uniqueness of the contained key column anyway, so there's no real advantage.

There is also an important consequence of marking an index unique on update plans that modify keys of that index to consider:

Setup

CREATE TABLE dbo.Customers 
(
   CustomerID int NOT NULL PRIMARY KEY,
   FirstName nvarchar(50),
   LastName nvarchar(50),
   [Address] nvarchar(200),
   Email nvarchar(260)
);

CREATE NONCLUSTERED INDEX 
    IX_Customers_CustomerIDEmail 
ON dbo.Customers
(
   CustomerID,
   Email
);

-- Pretend we have some rows
UPDATE STATISTICS dbo.Customers 
WITH ROWCOUNT = 100000, PAGECOUNT = 20000;

Per-index update plan (non-unique index)

UPDATE dbo.Customers 
SET Email = N'New', [Address] = 'New Address'
WHERE Email = N'Old' 
OPTION (QUERYTRACEON 8790); -- Per-index update plan

Execution plan:

Split & Filter

The optimizer often makes a cost-based decision between updating nonclustered indexes per-row (a 'narrow' plan) or per-index (a 'wide' plan). The default strategy (except for in-memory OLTP tables) is a wide plan.

Narrow plans (where nonclustered indexes are maintained at the same time as the heap/clustered index) are a performance optimization for small updates. This optimization is not implemented for all cases - using certain features (like indexed views) means that the associated index(es) will be maintained in a wide plan.

More information: Optimizing T-SQL Queries that Change Data

In this case, I have used undocumented trace flag 8790 to force a wide update plan: The plan therefore shows the clustered and nonclustered indexes being maintained separately.

The Split turns each update into a separate delete & insert pair; the Filter filters out any rows that would not result in a change to the index.

More information: (Non-updating updates) by the SQL Server QO Team.

Per-index update plan (unique index)

-- Same index, but unique
CREATE UNIQUE INDEX IX_Customers_CustomerIDEmail ON Customers
(
   CustomerID,
   Email
)
WITH (DROP_EXISTING = ON);

UPDATE dbo.Customers 
SET Email = N'New', [Address] = 'New Address'
WHERE Email = N'Old' 
OPTION (QUERYTRACEON 8790); -- Per-index update plan

Execution plan:

Split-Sort-Collapse

Notice the extra Sort and Collapse operators when the index is marked unique.

This Split-Sort-Collapse pattern is required when updating the keys of a unique index, to prevent intermediate unique key violations.

More information: Maintaining Unique Indexes by Craig Freedman

The Sort in particular can be a problem. Not only is it an unnecessary extra cost, it may spill to disk if estimates are inaccurate.

About nonclustered keys

Another factor to consider is that nonclustered index structures are always unique, at every level of the index, even if UNIQUE is not specified. The clustering key(s) - and possibly a uniquifier if the clustered index is not marked unique - are added to a non-unique nonclustered index at all levels.

As a consequence, the following index definiton:

CREATE INDEX IX_Customers_CustomerIDEmail ON Customers
(
   Email
)
WITH (DROP_EXISTING = ON);

...actually contains the keys (Email, CustomerID) at all levels. It is therefore 'seekable' on both columns:

SELECT * 
FROM dbo.Customers AS C WITH (INDEX(IX_Customers_CustomerIDEmail))
WHERE C.Email = N'Email'
AND C.CustomerID = 1;

Seeks

More information: More About Nonclustered Index Keys by Kalen Delaney