Sql-server – Will a nonclustered index with a unique column always address all queries filtering that column first

indexindex-tuningnonclustered-indexsql serversql-server-2008-r2

I have found large heap tables that are over-indexed and for example have multiple different nonclustered indexes with different columns, but some of those indexes have the (only one present) primary key column set as their first index column.

I made some investigation, played around with indexes and queries and made my conclusions that I would like to see confirmed by professionals to be sure.

Question 1: Am I right to say it is nonsense to have multiple indexes based on the primary key instead of only ONE index based on the primary key column as the first and ONLY index column, because all queries using this pk for filtering or joining return only ONE row? (Assuming additional filtered, joined or selected columns are put into the index as included columns to entirely cover the queries and avoid RID lookups)

Question 2: Is it right to say that nonclustered indexes based on the primary key of a table should always be created as UNIQUE indexes? (a setting that is optional for whatever reason since sql server should already know that an index including the pk will be unique…)

Best Answer

Yes, given the constraints in the question, particularly that the primary key column is the leading column in the indexes. Also assuming the primary key never changes.
Not necessarily.

The optimizer can indeed infer uniqueness without marking the nonclustered index unique.

Marking the index unique may introduce a Split-Sort-Collapse combination in execution plans that change an index key. The extra Sort in particular has the potential to be performance-affecting.

On the other hand, not marking the index unique risks data integrity if the primary key is ever changed.

Example

CREATE TABLE dbo.Test
(
    pk integer PRIMARY KEY NONCLUSTERED,
    c1 integer NOT NULL,
    c2 integer NOT NULL
);

-- Not unique on pk, c1
CREATE NONCLUSTERED INDEX ic1 ON dbo.Test (pk, c1);

-- Unique on pk, c2
CREATE UNIQUE NONCLUSTERED INDEX ic2 ON dbo.Test (pk, c2);

Uniqueness

-- Neither plan has an aggregate
SELECT DISTINCT T.c1 FROM dbo.Test AS T WHERE T.pk = 1;
SELECT DISTINCT T.c2 FROM dbo.Test AS T WHERE T.pk = 1;

Split, Sort, Collapse

-- No split, sort, collapse
UPDATE dbo.Test SET c1 = CHECKSUM(NEWID());

-- Split, sort, collapse updating unique key
UPDATE dbo.Test SET c2 = CHECKSUM(NEWID());

Note the split-sort-collapse plan is also a wide (per-index) update.

Uniqueness is a huge topic though. I would normally mark something that is unique as unique, unless there is a good reason not to. Some further reading from my blog:

To anticipate comments about heap tables: Most tables benefit from being clustered. You need good reasons to choose a heap structure, especially from a space management point of view, if the table ever experiences deletes. Updates can also introduce performance impacts if columns expand beyond the space available on the original page (forwarded records).

Related Solutions

Sql-server – Implications of using a Unique NonClustered Index with Covering Columns instead of a Primary Key

Performance Optimisation. By removing either the PK or an Index, there will be less pages needed for my indexes = faster writes, plus also maintenance/operational benefits, i.e. one less index to keep defragged etc.

You need to be able to prove that what's proposed will actually help (cost vs. benefit). For what reason is this change being considered? Are there actually performance issues with this table, or did it just "look wrong"?

Here are some other questions that will help you come to the best decision for your environment:

How much time would be saved in the maintenance window? In the backup window?
How much storage space would this save (data files, log files, backups, etc.)?
Is INSERT performance on this table really a bottleneck right now? How much would it improve? Is removing an index the best strategy to fix that problem?
Will this cause problems with database tools and frameworks (ORMs particularly) that expect each table to have a primary key and not just a unique index? Transactional Replication requires a primary key on published tables.
Is a self-documenting database schema important?
Despite its limited use, is the narrowness of the primary key index still allowing the optimizer to produce more efficient plans for certain queries? (Use sys.dm_db_index_usage_stats to find out.)

Personally speaking, from what you've told us, I would leave it alone until it can be proven that both (a) the extra index is a problem, and (b) removing it is the solution.

Postgresql – Is unique index better than unique constraint when an index with an operator class is required

For a moment I thought one might be able to use a pre-existing text_pattern_ops index with the USING INDEX clause when adding a UNIQUE CONSTRAINT. But that fails, because:

ERROR: index "book2_name_like" does not have default sorting behavior

Per documentation:

The index cannot have expression columns nor be a partial index. Also, it must be a b-tree index with default sort ordering. These restrictions ensure that the index is equivalent to one that would be built by a regular ADD PRIMARY KEY or ADD UNIQUE command.

For instance, a unique index like that would allow a FK constraint referencing it, but perform terribly, because it does not support standard operators.
Per documentation:

Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index.

So to answer the question:

If you need a UNIQUE CONSTRAINT (among other reasons: to reference it with a FK), your first variant with constraint and index is the only option. Additionally, the default operator class of the index created by the constraint supports more operations (like sorting in default sort order).

If you don't need any of that go with your second variant because, obviously, just one index is cheaper to maintain: just a UNIQUE text_pattern_ops index.

Differences between index and constraint:

How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?

Alternative with COLLATE "C"

Instead of creating two indexes, there is another alternative for xxx_pattern_ops indexes that may be preferable. The documentation:

The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard "C" locale.

And:

The index automatically uses the collation of the underlying column.

You can create the column without collation (using COLLATE "C"). Then the default operator class behaves the same way as text_pattern_ops would - plus the index can be used with all the standard operators.

CREATE TABLE book2 (
   book_id serial PRIMARY KEY,
   book    text NOT NULL COLLATE "C" UNIQUE  -- that's all!
);

Now, LIKE can use the index:

SELECT * FROM book2 WHERE book LIKE 'foo%';

But ILIKE still can't:

SELECT * FROM book2 WHERE book ILIKE 'foo%';

SQL Fiddle.

Consider a trigram index using the additional module pg_trgm for a more general solution.:

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL