Solving supertype-subtype relationship without sacrificing data consistency in a relational database

consistencydatabase-designsubtypes

Trying to get better at designing databases, I noticed I'm always stuck trying to solve variations of the exact same problem.

Here is an example using common requirements:

An online store sells different categories of product.
The system must be able to retrieve the list of all product categories, say food and furniture.
A customer may order any product and retrieve his order history.
System must store specific properties depending on the product category ; say the expiration_date and calories for any food product and manufacture_date for any furniture product.

If it wasn't for requirement 4, the model could be quite straightforward:

Problem is trying to solve requirement 4. I thought of something like this:

In this approach, the relationships product-furniture and product-food are supertype-subtype (or superclass-subclass) associations; the primary key of the subtype is also a foreign key to the supertype primary key.

However, this approach can not guarantee the category referenced via a foreign key to the product will be consistent with its actual subtype. For instance, nothing stops me from setting food category to a product tuple having a subtype row in the Furniture table.

I read various articles about inheritance in modelling relational databases, especially this one and this one which were very helpful but didn't solve my problem for the reason mentioned above. But whatever model I come with, I'm never satisfied with the data consistency.

How can I solve requirement 4 without sacrificing data consistency ? Am I going all wrong here ? If so, what would be the best way to solve this problem based on these requirements ?

Best Answer

One common way is to add a classifier that is "inherited" like:

CREATE TABLE products
( product_id ... NOT NULL PRIMARY KEY
, ...
, product_type ... NOT NULL 
,     UNIQUE (product_id, products_type)
,     CHECK (product_type IN ('food', 'furniture'))
);

product_type would typically be a code of some kind. Might be a foreign key to a "lookup" table instead of a check constraint. For the sub-types:

CREATE TABLE food
( product_id ... NOT NULL PRIMARY KEY
, ...
, product_type ... DEFAULT 'food' NOT NULL
,    FOREIGN KEY (product_id, product_type)
     REFERENCES products (product_id, product_type)
,    CHECK (product_type = 'food')
);

and a similar one for furniture. The constraints guarantee that product_type is consistent between super- an sub- tables.

There are products (I've heard) that allow sub-selects in CHECK constraints, but the majority do not. For such product something like:

CREATE TABLE food
( product_id ... NOT NULL PRIMARY KEY
, ...
,    FOREIGN KEY (product_id)
     REFERENCES products (product_id)
,    CHECK (
         (SELECT product_type 
          FROM products p 
          WHERE p.product_id = product_id) = 'food'
     )
);

could be used.

An alternative to the latter is to use before triggers for insert/update, and signal an exception if the wrong product_type is used. Personally I don't fancy using procedural code for integrity constraints, but I guess it is a matter of taste.

Related Solutions

Database schema for a product with multiple categories and hierarchical categories

What you are proposing is a good solution for your requirement of M:N products to categories and hierarchical categories.

To avoid exposing yourself to numerous updates: You need to do two things to ensure that you don't have a lot of updates in your intersection table.

First, you need to be sure that your categories have a stable, persistent primary key.

Second, you need to link food items to leaf categories. Don't join Cherry to Red, Healthy, Fruit and Food - just join it to Red and Healthy. Your nested sets take care of all of the secondary (and higher level) associations.

Sql-server – How to represent class table inheritance (current DBMS-specific way please)

Ultimately, I implemented ypercube's suggestion from comments:

The "type" column can be defined as a computed (but constant) column in SQL-Server. I think it has to be PERSISTED though so it can participate in the Foreign Key constraint.

This worked well both for performance and compatibility with with my tools (Entity Framework <= 6.0):

CREATE TABLE [dbo].[Account](
    [Id] [int] IDENTITY(1000,1) NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole] [int] NOT NULL,
    [FirstName] [nvarchar](50) NOT NULL,
    [LastName] [nvarchar](50) NOT NULL,
CONSTRAINT [UX_Derived_Relation] UNIQUE ([Id], [CommunityRole]))

CREATE TABLE [dbo].[Recruiter](
    [Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole]  AS ((1)) PERSISTED NOT NULL,
    [RecruiterSpecificValue]  [int] NOT NULL,
FOREIGN KEY ([Id], [CommunityRole]) REFERENCES Account([Id], [CommunityRole]))

CREATE TABLE [dbo].[Candidate](
    [Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole]  AS ((2)) PERSISTED NOT NULL,
    [CandidateSpecificValue]  [int] NOT NULL,
FOREIGN KEY ([Id], [CommunityRole]) REFERENCES Account([Id], [CommunityRole]))

This mapped well to my implementation of multiple discrete account types, Recruiter and Candidate, on my Job Board.

Best Answer

Related Solutions

Database schema for a product with multiple categories and hierarchical categories

Sql-server – How to represent class table inheritance (current DBMS-specific way please)

Related Question