The third party Microsoft SQL Server database I'm working on is using the structure which can be illustrated with this example:
Let's take three tables:
-
Shop
which is the topmost level and corresponds, for example, to an e-commerce website, given that several websites are using the same database, -
Category
which is a logical category of products within aShop
, -
Product
which belongs to aCategory
(given that the category is mandatory, so there should be no products which don't belong to any category).
Category
has one foreign key to Shop
. The Product
has two foreign keys: one to Category
, another to Shop
.
If I were designing a similar database, I would have put only one foreign key to Product
, linking it only to a Category
. IMO, it:
-
Simplifies the schema,
-
Avoids the risk of inconsistent state, where product 1 belongs to category 1 and site 2, but category 1 itself belongs to site 1,
-
(Avoids redundant information to waste the place in the database),
-
Doesn't make it particularly more difficult to query data. Even if there are cases (for example a search) where the website would need products without carrying too much about the categories, but still considering to what shop they belong, making two joins instead of one is not a big deal.
Why was this third party database done in the way it's done? Are there benefits from this approach?
Best Answer
There should be a single foreign key referencing two columns in categories.
Think about it this way. Conceptually, normalization starts with a single relation that contains all the attributes.
It should be clear that the only candidate key is {shop_name, category_name, product_name}. It's in 5NF. There's no redundant data.
Replacing text with ID numbers won't improve that.
A design that requires you to chase ID numbers through a hierarchy of tables is an anti-pattern. It's like building an IMS database in SQL. (IMS was a problem the relational model intended to solve.)