I'm currently evaluating a schema for a hierarchical data structure. My main problem is how I should design the schema to prevent inconsistent data (reference of foreign key in another hierarchy). The two variants I discovered are either to use a composite or a surrogate key.
Requirements
- Strict consistency checking per hierarchy, e.g. a Part in Hierarchy A cannot have a foreign key on a Type in Hierarchy B (see example schemas below).
- The constraints have to be checked in the database and not with application code, as the database is used from different applications.
- Scalable for performance and extendable with more hierarchy levels.
Variant 1: Composite Key
In this example, the foreign keys are part of the composite primary key. With this, it is automatically checked that the Type of the Part are assigned with the same Item (same hierarchy), because they are part of the composite primary key.
This approach seems troublesome to me, as it is not very easy to extend the model. For example, if I decide to create a PartMetadata table, which holds metadata for a single Part, I have to include the whole composite key, even though the metadata has no connection to the hierarchy.
Item
ItemId (PK)
Name
Type
TypeId (PK)
ItemId (PK, FK)
Name
SubItem
SubItemId (PK)
ItemId (PK, FK)
Name
Part
PartId (PK)
ItemId (PK, FK)
SubItemId (PK, FK)
TypeId (PK, FK)
Name
Variant 2: Surrogate Key
In this case I would have to define additional constraints or triggers to ensure that it is not possible to insert a Part with a Type that belongs to a different Item.
Item
ItemId (PK)
Name
Type
TypeId (PK)
ItemId (FK)
Name
SubItem
SubItemId (PK)
ItemId (FK)
Name
Part
PartId (PK)
SubItemId (FK)
TypeId (FK)
Name
Questions
- Should the surrogate solution always be favored? In what cases is it useful to use the composite key solution?
- Which solution is better maintainable in the long term, especially if the hierarchy gets bigger?
- Which solution offers better performance (reporting, CRUD)? Does the surrogate solution have a huge impact because of check constraints or triggers?
- Are there other options (in my opinion, both my variants seem like non-ideal solutions)?
- Has someone experience with both variants and can share his wisdom?
Best Answer
With the 1st design, you can enforce that a
Part
is related (throughSubItem
andType
) to the sameItem
. With the second, you can't (through DDL alone, you need triggers as you already mentioned to enforce that or make sure all CRUD operations are done through stored procedures that take care of the restriction). Otherwise you may have aPart
that is related to aSubItem
and aType
that refer to differentItems
.Some notes on your (1st) design:
It's not clear that you have the correct
FOREIGN KEY
constraints. Your requirement cannot be enforced by 3 simple FKs inPart
but needs 2 composite ones:You may not need to have
PartId
in the PK. Is it possible to have 2 or more parts that belong to the sameSubItem
and sameType
? If not, the unique (or primary key) constraint should be on(ItemId, SubItemId, TypeId)
. If yes, then it should be on(ItemId, SubItemId, TypeId, PartId)
, as you have it.Regarding
PartMetadata
or any other table that can referencePart
table, you don't have to include the whole composite key ofPart
. You can always add a new surrogateUNIQUE
key inPart
and use that instead. This is common practice I think in complex and high hierarchies of entities. We can use composite PKs and FKs up to a point. When they become too wide (4-6 columns), we can add surrogate unique keys in a few important tables and use them for referencing from lower tables in the model.