Performance Optimisation. By removing either the PK or an Index, there
will be less pages needed for my indexes = faster writes, plus also
maintenance/operational benefits, i.e. one less index to keep
defragged etc.
You need to be able to prove that what's proposed will actually help (cost vs. benefit). For what reason is this change being considered? Are there actually performance issues with this table, or did it just "look wrong"?
Here are some other questions that will help you come to the best decision for your environment:
How much time would be saved in the maintenance window? In the backup window?
How much storage space would this save (data files, log files, backups, etc.)?
Is INSERT
performance on this table really a bottleneck right now? How much would it improve? Is removing an index the best strategy to fix that problem?
Will this cause problems with database tools and frameworks (ORMs particularly) that expect each table to have a primary key and not just a unique index? Transactional Replication requires a primary key on published tables.
Is a self-documenting database schema important?
Despite its limited use, is the narrowness of the primary key index still allowing the optimizer to produce more efficient plans for certain queries? (Use sys.dm_db_index_usage_stats
to find out.)
Personally speaking, from what you've told us, I would leave it alone until it can be proven that both (a) the extra index is a problem, and (b) removing it is the solution.
For a moment I thought one might be able to use a pre-existing text_pattern_ops
index with the USING INDEX
clause when adding a UNIQUE CONSTRAINT
. But that fails, because:
ERROR: index "book2_name_like" does not have default sorting behavior
Per documentation:
The index cannot have expression columns nor be a partial index. Also,
it must be a b-tree index with default sort ordering. These
restrictions ensure that the index is equivalent to one that would be
built by a regular ADD PRIMARY KEY
or ADD UNIQUE
command.
For instance, a unique index like that would allow a FK constraint referencing it, but perform terribly, because it does not support standard operators.
Per documentation:
Note that you should also create an index with the default operator
class if you want queries involving ordinary <
, <=
, >
, or >=
comparisons to use an index.
So to answer the question:
If you need a UNIQUE CONSTRAINT
(among other reasons: to reference it with a FK), your first variant with constraint and index is the only option. Additionally, the default operator class of the index created by the constraint supports more operations (like sorting in default sort order).
If you don't need any of that go with your second variant because, obviously, just one index is cheaper to maintain: just a UNIQUE text_pattern_ops
index.
Differences between index and constraint:
Alternative with COLLATE "C"
Instead of creating two indexes, there is another alternative for xxx
_pattern_ops
indexes that may be preferable. The documentation:
The difference from the default operator classes is that the values
are compared strictly character by character rather than according to
the locale-specific collation rules. This makes these operator
classes suitable for use by queries
involving pattern matching expressions (LIKE
or POSIX regular
expressions) when the database does not use the standard "C" locale.
And:
The index automatically uses the collation of the underlying column.
You can create the column without collation (using COLLATE "C"
). Then the default operator class behaves the same way as text_pattern_ops
would - plus the index can be used with all the standard operators.
CREATE TABLE book2 (
book_id serial PRIMARY KEY,
book text NOT NULL COLLATE "C" UNIQUE -- that's all!
);
Now, LIKE
can use the index:
SELECT * FROM book2 WHERE book LIKE 'foo%';
But ILIKE
still can't:
SELECT * FROM book2 WHERE book ILIKE 'foo%';
SQL Fiddle.
Consider a trigram index using the additional module pg_trgm for a more general solution.:
Best Answer
Yes, given the constraints in the question, particularly that the primary key column is the leading column in the indexes. Also assuming the primary key never changes.
Not necessarily.
The optimizer can indeed infer uniqueness without marking the nonclustered index unique.
Marking the index unique may introduce a Split-Sort-Collapse combination in execution plans that change an index key. The extra Sort in particular has the potential to be performance-affecting.
On the other hand, not marking the index unique risks data integrity if the primary key is ever changed.
Example
Uniqueness
Split, Sort, Collapse
Note the split-sort-collapse plan is also a wide (per-index) update.
Uniqueness is a huge topic though. I would normally mark something that is unique as unique, unless there is a good reason not to. Some further reading from my blog:
To anticipate comments about heap tables: Most tables benefit from being clustered. You need good reasons to choose a heap structure, especially from a space management point of view, if the table ever experiences deletes. Updates can also introduce performance impacts if columns expand beyond the space available on the original page (forwarded records).