Sql-server – the benefit of two FK, when one of them can be deduced from another table

foreign keysql server

The third party Microsoft SQL Server database I'm working on is using the structure which can be illustrated with this example:

Let's take three tables:

  • Shop which is the topmost level and corresponds, for example, to an e-commerce website, given that several websites are using the same database,

  • Category which is a logical category of products within a Shop,

  • Product which belongs to a Category (given that the category is mandatory, so there should be no products which don't belong to any category).

Category has one foreign key to Shop. The Product has two foreign keys: one to Category, another to Shop.

If I were designing a similar database, I would have put only one foreign key to Product, linking it only to a Category. IMO, it:

  1. Simplifies the schema,

  2. Avoids the risk of inconsistent state, where product 1 belongs to category 1 and site 2, but category 1 itself belongs to site 1,

  3. (Avoids redundant information to waste the place in the database),

  4. Doesn't make it particularly more difficult to query data. Even if there are cases (for example a search) where the website would need products without carrying too much about the categories, but still considering to what shop they belong, making two joins instead of one is not a big deal.

Why was this third party database done in the way it's done? Are there benefits from this approach?

Best Answer

Category has one foreign key to Shop. The Product has two foreign keys: one to Category, another to Shop.

There should be a single foreign key referencing two columns in categories.

create table products (
  ...
  primary key (shop_id, category_id, product_id),
  foreign key (shop_id, category_id) 
    references categories (shop_id, category_id)
);

Think about it this way. Conceptually, normalization starts with a single relation that contains all the attributes.

shop_name  category_name    product_name
--
Wibble     Chain saws       Stihl 350
Wibble     Chain saws       Poulan 3X
Wibble     Pole saws        Black & Decker 14 foot electric
Wibble     Pole saws        Corona 15 foot

Thursby    Pole saws        Hitachi electric
Thursby    Pole saws        Remington electric
Thursby    Chain saws       Husqvarna 460
Thursby    Chain saws       Poulan 3x

It should be clear that the only candidate key is {shop_name, category_name, product_name}. It's in 5NF. There's no redundant data.

Replacing text with ID numbers won't improve that.

A design that requires you to chase ID numbers through a hierarchy of tables is an anti-pattern. It's like building an IMS database in SQL. (IMS was a problem the relational model intended to solve.)