Why is data integrity not possible when using denormalization

database-designdenormalization

This tutorial says the following:

For proponents of denormalization, the thinking is as follows:
normalization creates more tables as you proceed towards higher normal
forms, but more tables mean there are more joins to be made when data
is retrieved, which in turn slows down your queries. For that reason,
to improve the performance of certain queries, you can override the
advantages to data integrity and return the data structure to a
lower normal form.

Why is data integrity not possible when using denormalization? I mean wouldn't the DBMS make sure that all redundant copies of some piece of data gets updated automatically when that piece of data is updated?

In Oracle, for example, denormalization is implemented using materialized views, and when the original tables are modified, then the data in the materialized view will be modified automatically (you can choose that the data not be modified automatically also). And even if you don't use materialized views (if your DBMS doesn't support it for example) and implemented the denormalization manually, you can still write triggers yourself to keep the data consistent (so using denormalization does not necessarily mean that data integrity is not possible).

Best Answer

Data integrity is possible in both normalized and denormalized databases. What is different in the two cases is the number of different ways an application program can write bad data to the database.

Let's take a simple case of bad data. In the database, it says that employee with id 123 has last name "Brown". But in the real world, employee 123 has last name "Browne". This is bad data no matter how you look at it. But let's go a little deeper.

In one database, a normalized one, the last name is stored in precisely one place, in the Employees table. It's either right or it's wrong. And, if it's wrong, it's up to someone who knows the right answer to fix the data in the database. This is part of data management, although not a very exciting part to those of us who work in automation.

In another database, a denormalized one, the last name of employee 123 is stored in two tables, the Employees table and the Contacts table. The contacts table is about phone numbers and addresses. In the normalized database, there is no contacts table. It is a view instead. Now, there are more possibilities. The last name could be right in both places, or it could be wrong in both places. But there is another failure mode. It could be right in one place, but wrong in the other place. Now the database is not only wrong, but self contradictory. This is simply not possible in the normalized database, because the last name is stored only once.

This is an example of what is called harmful redundancy. Second Normal Form through Fifth Normal Form are about eliminating cases of harmful redundancy, by imposing rules that cause one fact to be stored in one place. This doesn't guarantee data integrity, but at least it obviates data inconsistency. You still have to manage data to get it all right, but now the DBMS is helping you manage data correctly.

Performance is a whole other story. It may be the case that one of the databases outperforms the other for certain kinds of traffic. It's even possible that one of the databases is more performant for updates but the other is more performant for queries. But great performance and wrong answers may be catastrophic in some cases, and acceptable in other cases.

In the example I've given, the detection of wrong data is fairly trivial, even if it's somewhat tedious. In the real world, there are examples where wrong data detection is well nigh impossible unless you get all the help you can from the DBMS. Normalization can help in this regard.

Well, why don't you just makes sure that all the application code always does the right thing? In the first place, the amount of application code grows dramatically, as the scope of the mission increases. In the second place, data errors can creep in by the use of interactive SQL, maybe by the DBA. Writing defensive application code is a good thing, but for one reason or another, it might not get done.

Sorry this is so long, but I needed to give you an idea about what's really going on here. Data management is a never ending battle. Finding and cleaning up wrong data is just part of that battle. It's a dirty job, but somebody has to do it. A good DBMS, coupled with normalization can help.

Related Solutions

Database Design – Foreign Key Referencing a View

This business rule can be enforced in the model using only constraints. The following table should solve your problem. Use it instead of your view:

    CREATE TABLE BookAspectCommonTagLink
    (  BookID INT NOT NULL
    , AspectID INT NOT NULL
    , TagID INT NOT NULL
--TagID is deliberately left out of PK
    , PRIMARY KEY (BookID, AspectID)
    , FOREIGN KEY (BookID, TagID) 
        REFERENCES BookTag (BookID, TagID)
    , FOREIGN KEY (AspectID, TagID) 
        REFERENCES AspectTag (AspectID, TagID)
    ) ;

Using indexes to create multiple relationships to a table in order to enforce data integrity and add meaning

Situation 1:

Your tables have one relationship and not two. (example: a Device belongs to a DeviceType)

So, keep only one relationship, the one with the composite keys (that include the Primary Key). The other relationship is redundant when the composite one is defined.

I would also suggest you have same names for related columns:

DeviceCategory Table
CategoryCode | Name        | Description
----------------------------------------
WKS          | Workstation | Description of what classifies an item as a workstation...
LPT          | Laptop      | Description of what classifies an item as a laptop...

DeviceType Table
DeviceTypeID | CategoryCode | Manufacturer | Model  | IsTrackedInOtherSystemDefault
-----------------------------------------------------------------------------------
1            | WKS          | Dell         | GX1000 | true
2            | LPT          | HP           | dv4000 | false
3            | WKS          | HP           | xx9000 | false

Device Table
DeviceID | SerialNumber | DeviceTypeID | IsTrackedInOtherSystem | CategoryCode 
------------------------------------------------------------------------------
1        | I81U812      | 1            | true                   | WKS
2        | N0S4A2       | 1            | false                  | WKS
3        | 3BL1NDMIC3   | 2            | false                  | LPT

So, the design would be:

DeviceCategory
--------------
CategoryCode   PK
Name           U1
Description

DeviceType 
----------
DeviceTypeID   PK   U1
CategoryCode   FK   U1
Manufacturer   U2
Model          U2
IsTrackedInOtherSystemDefault

Device 
------
DeviceID       PK   U1
SerialNumber   U2
DeviceTypeID   FK1
IsTrackedInOtherSystem
CategoryCode   FK1  U1

and for the Computer:

Computer
--------
DeviceID       PK   FK1
Hostname       U1
IPAddress      U2
CategoryCode   FK1  CHK

The "additional" UNIQUE keys (the two composite U1 ones) will be needed in most DBMS to enforce the foreign key constraints. I guess this answers your question 2, relationships needs indices to be enforced, so (you have to) use them. They will be used by the DBMS not only to enforce integrity but in your queries/statements, when you will be joining the tables.

The only one that is not needed is the U3 you had in the Computer table.

About question 3 (the over-engineering part): No, I don't think so but that's just my opinion. And you haven't told us if this is a homework/exercise or a real project, whether you will be holding only your family's or a multi-million company's inventory, etc.

Situation 2

I think what you have is fine and there is no need (and not a good idea) to have referential integrity constraints on these columns. This is a default value that is copied in the second table via a stored procedure (I guess during Inserts on the second table?) or altered by a user. If you add an FK, won't that deny users the ability to override the default?

The names of the two columns are self-explanatory enough for a DBA to understand the functionality.

Best Answer

Related Solutions

Database Design – Foreign Key Referencing a View

Using indexes to create multiple relationships to a table in order to enforce data integrity and add meaning

Related Question