I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity. I would be interested to hear if there is a different way of modelling the data that is both normalized and enforces integrity.
Here is a typical (simplified) example:
create table [ProductType]
(
[ProductTypeId] INT identity(1,1) not null,
[ProductTypeName] nvarchar(100) not null,
constraint [PK_ProductType] primary key ([ProductTypeId])
)
create table [Product]
(
[ProductId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[ProductName] nvarchar(100) not null,
constraint [PK_Product] primary key ([ProductId]),
constraint [AK_Product] unique ([ProductId], [ProductTypeId]),
constraint [FK_Product_ProductType] foreign key ([ProductTypeId]) references [ProductType]([ProductTypeId])
)
create table [ProductTypeProperty]
(
[PropertyId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[PropertyName] nvarchar(100) not null,
constraint [PK_ProductTypeProperty] primary key ([PropertyId]),
constraint [AK_ProductTypeProperty] unique ([PropertyId], [ProductTypeId]),
constraint [FK_ProductTypeProperty_ProductType] foreign key ([ProductTypeId]) references [ProductType]([ProductTypeId])
)
create table [ProductPropertyValue]
(
[ProductId] INT not null,
[PropertyId] INT not null,
[ProductTypeId] int not null,
[PropertyValue] nvarchar(100) not null,
constraint [PK_ProductPropertyValue] primary key ([ProductId], [PropertyId]),
constraint [FK_ProductPropertyValue_ProductTypeProperty] foreign key ([PropertyId], [ProductTypeId]) references [ProductTypeProperty]([PropertyId], [ProductTypeId]),
constraint [FK_ProductPropertyValue_Product] foreign key ([ProductId], [ProductTypeId]) references [Product]([ProductId], [ProductTypeId])
)
SET IDENTITY_INSERT [dbo].[ProductType] ON
INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (1, N'Clothing')
INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (2, N'Drink')
SET IDENTITY_INSERT [dbo].[ProductType] OFF
SET IDENTITY_INSERT [dbo].[ProductTypeProperty] ON
INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (1, 1, N'Colour')
INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (2, 1, N'Size')
INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (3, 2, N'Volume')
SET IDENTITY_INSERT [dbo].[ProductTypeProperty] OFF
SET IDENTITY_INSERT [dbo].[Product] ON
INSERT [dbo].[Product] ([ProductId], [ProductTypeId], [ProductName]) VALUES (1, 1, N'T-shirt')
INSERT [dbo].[Product] ([ProductId], [ProductTypeId], [ProductName]) VALUES (2, 2, N'Milk')
SET IDENTITY_INSERT [dbo].[Product] OFF
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 1, 1, N'Red')
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 2, 1, N'XL')
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (2, 3, 2, N'1 pint')
-- NOTE THAT THE FKS ON [PRODUCTPROPERTYVALUE] MEAN YOU CANNOT RUN EITHER OF THESE
-- WHICH TRY TO ASSIGN A PROPERTY TO A PRODUCT THAT DOESN'T BELONG TO ITS TYPE
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 3, 1, N'Red')
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (2, 2, 1, N'XL')
Here's what I am trying to model:
- A product has a product type (e.g. "clothing", "drink" etc)
- A product type has a number of properties (e.g. "clothing" has "colour" and "size")
- A product has property values for any/all of the properties belonging to its type
- A product cannot have property values for properties not belonging to its type
Point #4 is the one that causes the challenge. In order to create foreign keys to achieve this ([FK_ProductProperty_ProductTypeProperty] and [FK_ProductPropertyValue_Product]) I have de-normalized and added "unnecessary" unique constraints:
- Added [ProductTypeId] to [ProductPropertyValue]
- Added a unique constraint on [PropertyId], [ProductTypeId] to [ProductPropertyValue]
- Added a unique constraint on [ProductId], [ProductTypeId] to [Product]
However it seems impossible to achieve all 4 of the above points without these … is it?
This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:
P ---------> PT
^ ^
| |
| |
PPV ------> PTP
Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).
Best Answer
This sounds like a contradiction. It's normalizing that usually enforces integrity. And in my opinion, you have a misunderstanding. You did not de-normalize your design. It is normalized just fine.
The specific issue - I call it diamond shape - appears quite often, although many do not realize or see it as a problem and do not enforce the constraint that you have enforced with your design.
And I don't think there is any other way, ignoring some details. A unique key of the "grandparent" table (
PropertyType
in this case), has to be part of unique constraints of the "parent" tables (Product
,Property
here) and to appear in the "child" table (ProductPropertyValue
), exactly as you did.I disagree about the "unnecessary". They are needed, first for the
FOREIGN KEY
constraints to be defined and work.Your unique constraints are just fine and needed for the integrity reasons you have described:
to ensure that a
ProductPropertyValue
is related to aProduct
and aProperty
that both (Product
andProperty
) belong to the samePropertyType
.The redundant in the design are the primary key constraints on the identity columns of
Product
andProperty
. They can be removed without any loss of integrity. I know it's more than common that an automatically incrementing (identity
in SQL Server) column has also aUNIQUE
orPRIMARY KEY
constraint. But it doesn't really need it. It may be good for performance reasons to keep that constraint, though. You may also use the column as foreign key in some other table, where the there is no connection with property types (and thus thePropertyTypeId
is unnecessary). But from a logical design and normalization point of view, there is no normalization issue. It's just an implementation, physical design detail.Yes, it is. But you have to start from entities, attributes and functional dependencies. When you add - before doing that - surrogate keys, you are tempted to use them everywhere as foreign key columns, considering that DBMS provides with the extra feature of Uniqueness. So, you may miss using some other unique column or combination. For example,
(ProductTypeId, ProductName)
might be able to identify products, too, so it could have a unique constraint, I don't know the details of your business rules. If it was, it could be used instead as a foreign key in theProductPropertyValue
. And similarly forProperty
.Whether that would be more efficient is another matter, as long columns like
nvarachar(100)
are not the best fit for indexes. So, even if you had that unique constraint, you might end up using what you chose in the first place, for efficiency reasons. (Consider 4+4+4=12 vs 4+200+200=404 if we had the two nvarchars fromProduct
andProperty
.) Choosing between a 12-bytes and a 404-bytes index is not hard choice. I'd think that all would go with the 12 size, even if theProductPropertyValue
is technically not in 3NF.And I say "technically" and stress again that the only reason that
ProductpropertyValue
looks as if it is not in 3NF, is because the DBMS provided identity columns (ProductId
andPropertyId
forProduct
andProperty
) have the nice extra feature of Uniqueness (which actually isn't guaranteed in SQL Server. You could setIDENTITY_INSERT ON/OFF
or some similar property in other DBMS and push values that are not unique in identity columns. Uniqueness is only guaranteed by the unique constraints you have in place.)That's why I call it "diamond shape"!
And please note that there is no loop there!
- Just start anywhere you like the follow the arrows. Can you return to where you started?
- No. So there are no loops.
Well, that is a problem. When the rules are complicated and involve many entities in multiple levels, the result is a complicated model, too, with multi-columns unique constraints and multi-column foreign keys. This may have performance implications as well as many DBMS do not like very much multi-column constraints (eg. the indexes are wider, the optimizers may not perform so well or not recognize at all multi-column foreign keys and miss query rewriting rules, etc.). And lets not talk about ORMs, where most of them expect a single column named
id
as the primary key in all tables ...There is not much you can do about it. You either enforce the constraints with often complex model or remove some constraints from the design and simplify it.
You can also test several designs with various levels of missing rules, balancing performance, ease of use and development and enforcing constraints. But at least you'll know where exactly you simplified and which constraints are not enforced at the database level.
See a similar problem, discussed in another question: Best data modelling approach to handle redundant foreign keys in relational model, where @MDCCL has a great answer with many details and where he suggests the same solution as your design. The similarity is obvious just by looking at the graphs:
The details match as well, if we check the columns and the constraints of both problems. The difference is that MDCCL started from a logical design and you won't see any
identity
or other such implementation details that should be considered at the next phase, of physical design, after we have completed the logical design and normalization.