PostgreSQL – Single Column vs Multicolumn Primary Key

best practicesdatabase-designpostgresqlprimary-key

Let's assume you have a table groups and a table item. Each item belongs to exactly one group. It is an inherent part of that group. An item cannot exist outside of a group and it cannot be moved to another group.

When trying to decide on a primary key for the item table, what should I use?

Should I make up an artificial global serial key like this:

CREATE TABLE items
(
    item serial PRIMARY KEY,
    group integer NOT NULL REFERENCES groups(group),
);

… or should I rather use a composite key and per-group item serial like this:

CREATE TABLE items
(
    group integer NOT NULL REFERENCES groups(group),
    item integer NOT NULL,

    PRIMARY KEY(group, item)
);

The reason why I'm leaning more towards the second solution is that the post URL will always show the group and item, so it makes sense to have both of them as the composite primary key. In case of the first solution, the URL contains superfluous information because the group ID can already be deduced from the item ID alone. The URL structure is given, however, and cannot be changed.

The disadvantage of the second solution is that you have to manage a per-group serial (i.e. each item integer should start from 0 for each group).

What's better in terms of best practices, normalization and performance? Or is it simply a matter of taste?

Best Answer

Much of this is a matter of taste and style. And more importantly: specific requirements and consistent conventions. However, there are good reasons for this generic advice:

CREATE TABLE item (
    item_id serial PRIMARY KEY,
    grp_id  integer NOT NULL REFERENCES grp(grp_id)
);
  • If you have an item_id, better make it unique and ideally a surrogate primary key without any specific meaning for the number itself. That's cheap and versatile. Per-group (not per-row) serials are typically expensive, pointless effort. If you desire a gap-less numbering per group, add some kind of rank column or use row_number() in a view. And don't rely on a gap-less numbering if you can avoid it. Sooner or later you'll have to delete or move around items.

  • It is convenient for maintenance to have a single column primary key that follows a naming convention (unless you have an obvious natural pk).

  • Don't use group as identifier. It is a reserved word. Even if these are just symbolic names.

  • If you have queries by grp_id, add another index:

    CREATE INDEX item_foo_idx ON item (grp_id, item_id);
    

    You might want to make it a UNIQUE INDEX for documentation, but that's redundant since item_id is unique already. With two integer columns, a multicolumn index is just as big as an index on just grp_id and there seem to be use cases where you need these two columns, so there is the chance for a covering index. This should be the optimum. More rationale here:
    Is a composite index also good for queries on the first field?