Should I always make an attribute id that’s an primary key

database-designprimary-keyunique-constraint

I was thinking, sometimes several columns in a table uniquely specify the whole row.
In that case, is there a need to create a special column for a primary key?
Is that good for something?

Best Answer

There are two schools of thought about this and some of it is database dependent.

In general, on PostgreSQL, I define natural primary keys wherever I can (and make these composite keys where appropriate) and I (usually) add a secondary, surrogate id key which is an integer and is used as a surrogate for the primary key in joins. I see this as a good way to navigate the positives and negatives of natural primary keys. Note that if I was on MySQL, I would probably set the id as a primary key just because of how InnoDB works regarding indexes and primary keys.

Benefits of Natural Keys

A natural key is a piece of collected data which identifies a row in a relation. Occasionally there are no natural keys but this is rare. Natural keys can be a single column or they can be a composite key of several columns. There are several important benefits to natural keys.

  1. Defining natural keys is necessary for normalization. Defining them in your DDL makes normalization mistakes more obvious. This leads to clearer database design.

  2. Forcing you to think about natural keys keeps data clear and avoids missing unique constraints.

Drawbacks of Composite Keys

Composite keys however pose performance issues when used in JOIN conditions. Heuristic planners have a harder time estimating numbers of rows returned by multi-column keys because the co-incidence rate is not necessarily known between key columns, and these may not be well distributed and this can throw planner estimates off.

Additionally this leads to wider join criteria which in turn leads to more memory used in joins. Additionally, and for obvious reasons, it takes significantly more computing power to calculate a multi-column join than a single column one.

Finally, natural keys are often "discovered" rather than defined. Changing a composite key as a join condition is a really error-prone hassle I am quite happy to live without.

Where a key is text, definitional, and the most commonly used piece of information in a table, I will often use it as the join key, but it is quite rare that all of these are met.

Database Implementation Variables

Consider two different databases, PostgreSQL which uses heap tables with external indexes (so primary key indexes are not privileged over unique indexes, and tables can be scanned in physical order), and InnoDB which uses tree structures in the table, and contains the row, essentially, in the primary key btree index. With MySQL's InnoDB, you must traverse the designated primary key on every query, even if other indexes are used, and physical-order scans are not supported.

In the first, case, performance is not adversely affected by using a non-pkey index, but in the second case it is and so you really want to have your primary key be the most frequently used unique index.