Let's assume you have a table groups
and a table item
. Each item belongs to exactly one group. It is an inherent part of that group. An item cannot exist outside of a group and it cannot be moved to another group.
When trying to decide on a primary key for the item
table, what should I use?
Should I make up an artificial global serial key like this:
CREATE TABLE items
(
item serial PRIMARY KEY,
group integer NOT NULL REFERENCES groups(group),
);
… or should I rather use a composite key and per-group item serial like this:
CREATE TABLE items
(
group integer NOT NULL REFERENCES groups(group),
item integer NOT NULL,
PRIMARY KEY(group, item)
);
The reason why I'm leaning more towards the second solution is that the post URL will always show the group and item, so it makes sense to have both of them as the composite primary key. In case of the first solution, the URL contains superfluous information because the group ID can already be deduced from the item ID alone. The URL structure is given, however, and cannot be changed.
The disadvantage of the second solution is that you have to manage a per-group serial (i.e. each item integer should start from 0 for each group).
What's better in terms of best practices, normalization and performance? Or is it simply a matter of taste?
Best Answer
Much of this is a matter of taste and style. And more importantly: specific requirements and consistent conventions. However, there are good reasons for this generic advice:
If you have an
item_id
, better make it unique and ideally a surrogate primary key without any specific meaning for the number itself. That's cheap and versatile. Per-group (notper-row
) serials are typically expensive, pointless effort. If you desire a gap-less numbering per group, add some kind of rank column or userow_number()
in a view. And don't rely on a gap-less numbering if you can avoid it. Sooner or later you'll have to delete or move around items.It is convenient for maintenance to have a single column primary key that follows a naming convention (unless you have an obvious natural pk).
Don't use
group
as identifier. It is a reserved word. Even if these are just symbolic names.If you have queries by
grp_id
, add another index:You might want to make it a
UNIQUE INDEX
for documentation, but that's redundant sinceitem_id
is unique already. With two integer columns, a multicolumn index is just as big as an index on justgrp_id
and there seem to be use cases where you need these two columns, so there is the chance for a covering index. This should be the optimum. More rationale here:Is a composite index also good for queries on the first field?