Surrogate vs Composite Key in Hierarchical Data Structures

check-constraintsforeign keyhierarchyprimary-keysurrogate-key

I'm currently evaluating a schema for a hierarchical data structure. My main problem is how I should design the schema to prevent inconsistent data (reference of foreign key in another hierarchy). The two variants I discovered are either to use a composite or a surrogate key.

Requirements

  • Strict consistency checking per hierarchy, e.g. a Part in Hierarchy A cannot have a foreign key on a Type in Hierarchy B (see example schemas below).
  • The constraints have to be checked in the database and not with application code, as the database is used from different applications.
  • Scalable for performance and extendable with more hierarchy levels.

Variant 1: Composite Key

In this example, the foreign keys are part of the composite primary key. With this, it is automatically checked that the Type of the Part are assigned with the same Item (same hierarchy), because they are part of the composite primary key.

This approach seems troublesome to me, as it is not very easy to extend the model. For example, if I decide to create a PartMetadata table, which holds metadata for a single Part, I have to include the whole composite key, even though the metadata has no connection to the hierarchy.

Item
  ItemId (PK)
  Name

Type
  TypeId (PK)
  ItemId (PK, FK)
  Name

SubItem
  SubItemId (PK)
  ItemId (PK, FK)
  Name

Part
  PartId (PK)
  ItemId (PK, FK)
  SubItemId (PK, FK)
  TypeId (PK, FK)
  Name

Variant 2: Surrogate Key

In this case I would have to define additional constraints or triggers to ensure that it is not possible to insert a Part with a Type that belongs to a different Item.

Item
  ItemId (PK)
  Name

Type
  TypeId (PK)
  ItemId (FK)
  Name

SubItem
  SubItemId (PK)
  ItemId (FK)
  Name

Part
  PartId (PK)
  SubItemId (FK)
  TypeId (FK)
  Name

Questions

  • Should the surrogate solution always be favored? In what cases is it useful to use the composite key solution?
  • Which solution is better maintainable in the long term, especially if the hierarchy gets bigger?
  • Which solution offers better performance (reporting, CRUD)? Does the surrogate solution have a huge impact because of check constraints or triggers?
  • Are there other options (in my opinion, both my variants seem like non-ideal solutions)?
  • Has someone experience with both variants and can share his wisdom?

Best Answer

With the 1st design, you can enforce that a Part is related (through SubItem and Type) to the same Item. With the second, you can't (through DDL alone, you need triggers as you already mentioned to enforce that or make sure all CRUD operations are done through stored procedures that take care of the restriction). Otherwise you may have a Part that is related to a SubItem and a Type that refer to different Items.

Some notes on your (1st) design:

  • It's not clear that you have the correct FOREIGN KEY constraints. Your requirement cannot be enforced by 3 simple FKs in Part but needs 2 composite ones:

    Part
      PartId    (PK)
      ItemId    (PK, FK1, FK2)
      SubItemId (PK, FK1)
      TypeId    (PK,      FK2)
      Name
    
  • You may not need to have PartId in the PK. Is it possible to have 2 or more parts that belong to the same SubItem and same Type? If not, the unique (or primary key) constraint should be on (ItemId, SubItemId, TypeId). If yes, then it should be on (ItemId, SubItemId, TypeId, PartId), as you have it.

Regarding PartMetadata or any other table that can reference Part table, you don't have to include the whole composite key of Part. You can always add a new surrogate UNIQUE key in Part and use that instead. This is common practice I think in complex and high hierarchies of entities. We can use composite PKs and FKs up to a point. When they become too wide (4-6 columns), we can add surrogate unique keys in a few important tables and use them for referencing from lower tables in the model.