Solving supertype-subtype relationship without sacrificing data consistency in a relational database

consistencydatabase-designsubtypes

Trying to get better at designing databases, I noticed I'm always stuck trying to solve variations of the exact same problem.

Here is an example using common requirements:

  1. An online store sells different categories of product.
  2. The system must be able to retrieve the list of all product categories, say food and furniture.
  3. A customer may order any product and retrieve his order history.
  4. System must store specific properties depending on the product category ; say the expiration_date and calories for any food product and manufacture_date for any furniture product.

If it wasn't for requirement 4, the model could be quite straightforward:

enter image description here

Problem is trying to solve requirement 4. I thought of something like this:

enter image description here

In this approach, the relationships product-furniture and product-food are supertype-subtype (or superclass-subclass) associations; the primary key of the subtype is also a foreign key to the supertype primary key.

However, this approach can not guarantee the category referenced via a foreign key to the product will be consistent with its actual subtype. For instance, nothing stops me from setting food category to a product tuple having a subtype row in the Furniture table.

I read various articles about inheritance in modelling relational databases, especially this one and this one which were very helpful but didn't solve my problem for the reason mentioned above. But whatever model I come with, I'm never satisfied with the data consistency.

How can I solve requirement 4 without sacrificing data consistency ? Am I going all wrong here ? If so, what would be the best way to solve this problem based on these requirements ?

Best Answer

One common way is to add a classifier that is "inherited" like:

CREATE TABLE products
( product_id ... NOT NULL PRIMARY KEY
, ...
, product_type ... NOT NULL 
,     UNIQUE (product_id, products_type)
,     CHECK (product_type IN ('food', 'furniture'))
);

product_type would typically be a code of some kind. Might be a foreign key to a "lookup" table instead of a check constraint. For the sub-types:

CREATE TABLE food
( product_id ... NOT NULL PRIMARY KEY
, ...
, product_type ... DEFAULT 'food' NOT NULL
,    FOREIGN KEY (product_id, product_type)
     REFERENCES products (product_id, product_type)
,    CHECK (product_type = 'food')
);

and a similar one for furniture. The constraints guarantee that product_type is consistent between super- an sub- tables.

There are products (I've heard) that allow sub-selects in CHECK constraints, but the majority do not. For such product something like:

CREATE TABLE food
( product_id ... NOT NULL PRIMARY KEY
, ...
,    FOREIGN KEY (product_id)
     REFERENCES products (product_id)
,    CHECK (
         (SELECT product_type 
          FROM products p 
          WHERE p.product_id = product_id) = 'food'
     )
);

could be used.

An alternative to the latter is to use before triggers for insert/update, and signal an exception if the wrong product_type is used. Personally I don't fancy using procedural code for integrity constraints, but I guess it is a matter of taste.