How to quantify the business value of normalization

database-designdatabase-recommendation

My company is developing an internal application with a SQL server database which was started earlier by another developer.
Now I am Looking at the model,the developer has stopped at 1NF. I proposed going all the way to 3NF but my manager said it would add no business value.
Is there a way to quantify the business values of normalization, specifically going to 3NF?

Best Answer

The actual costs are difficult to quantify as they are often diffused. Often they appear as slower transactions. For a system used rarely, this may not be significant. For a system used for an hour a day, that is taking twice as long as is required the cost would be about 100 hrs a year per person using the system. Slower transactions may also have higher error rates due to users being distracted or interrupted while processing transactions.

Failure to normalize the data often results in in consistent data through-out the application. Updates to some records may be missed, or incorrect records may be updated. Inconsistent data can have serious business and/or legal implications. Such problems can reduce the reputation of the business.

Updates to non-normalized records may take many times longer than with normalized data. (That is time someone is waiting for the system rather than moving to the next task.). In some cases the update will fail, or lock the table for excessively long periods of time.

Related Solutions

Can a 2NF relation be M:M

The normal forms don't really have anything to do with many-to-many relationships. If you happen to lose some as a byproduct of the normalization process, that's fine, but you won't generally do so. If we consider that we have two tables: Salesman and Product which each have ID fields as their primary key and we have a third table called Specializes which shows which Salesmen specialize in selling which products. This Specializes table would represent the many-to-many relationship since each salesman can specialize in multiple products and each product can be specialized in by more than one salesmen. It would probably look something like this (excuse the awkward formatting, we can't do real tables on StackExchange):

| SalesmanID | ProductID |
|------------|-----------|
|  1         |  1        |
|  1         |  2        |
|  2         |  1        |
|  2         |  2        |
|  2         |  3        |
|------------|-----------|

Obviously, the lack of nulls and repeated rows means that this table is in 1NF. In this table, the only candidate key is {SalesmanID, ProductID} and as such, there are no non-prime attributes. It also contains no non-trivial functional dependencies. Thus, it is necessarily in 2NF, 3NF, and BCNF. I'm also going to assert without proof that it's in 4NF, 5NF, 6NF, and DKNF (to avoid having to explain all of the details thereof). So really, no normal form removes many-to-many relationships, nor are they meant to. The purpose of normal forms is not to remove many-to-many relationships (and I am not actually clear on why you would want to) but rather to remove potential insert anomalies, update anomalies, and deletion anomalies. The primary role of the normal forms is to ensure that each piece of information is represented in a database table precisely once. Having the same information embedded in multiple places leads to problems. But that has nothing to do with many-to-many relationships.

I think that you mean to be asking a slightly different question about something like situations where a many-to-many relationship is embedded in a table which also tries to contain other information, such as if the table above contained the product name in addition to the product number (where name is functionally determined by number). A table like that would either violate 2NF (if the name did not also functionally determine the number) or Boyce Codd Normal Form (if the name did functionally determine the number).

You could also perhaps be thinking of a different situation: when we have two unrelated 1:M relationships in the same table, such as if we were to add a third column to identify which language or languages each salesman speaks.

| SalesmanID | ProductID | Language |
|------------|-----------|----------|
|  1         |  1        | English  |
|  1         |  2        | English  |
|  2         |  1        | Spanish  |
|  2         |  2        | Spanish  |
|  2         |  3        | Spanish  |
|  2         |  1        | French   |
|  2         |  2        | French   |
|  2         |  3        | French   |
|------------|-----------|----------|

As you can see, that table is quite problematic, since we need 6 entries to express that Salesman 2 specializes in 3 products and speaks 2 languages. This is a fourth normal form violation.

Edit:

Upon clarification, it's clear that what he's asking about is a table like the Specializes table, but with extra information about the salesmen and the products, essentially, a table which contains two entity sets and their many-to-many relationship in a single table. So to answer that question directly, yes, you can have lousy tables like that which are in 3NF. The normal form which guarantees that that won't happen is Boyce-Codd Normal Form (BCNF). Here's an example of a lousy table like that which is vulnerable to all kinds of anomalies (insert, update, and delete), but is in 2NF and 3NF.

| SalesmanName | SalesmanID | ProductID | ProductName |
|--------------|------------|-----------|-------------|
|   Alex       |  1         |  1        |  Thingy     |
|   Alex       |  1         |  2        |  Whatsit    |
|   Barb       |  2         |  1        |  Thingy     |
|   Barb       |  2         |  2        |  Whatsit    |
|   Barb       |  2         |  3        |  Whoosit    |
|--------------|------------|-----------|-------------|

So, looking at this table, it's obviously in 1NF. Further, we can identify the non-trivial functional dependencies very straightforwardly. SalesmanName -> SalesmanID. SalesmanID -> SalesmanName. ProductID -> ProductName. ProductName -> ProductID. Next we need to identify the candidate keys. There are four: {SalesmanName,ProductID}, {SalesmanName,ProductName}, {SalesmanID, ProductID}, and {SalesmanID, ProductName}. As such, we have no non-prime attributes. Thus, we are necessarily in 2NF (no functional dependencies between non-prime attributes) and 3NF (no non-trivial functional dependencies where the left-hand-side is not a super key and the right hand side contains a non-prime attribute). However, we are not in BCNF because there do exist non-trivial functional dependencies whose left-hand-side is not a superkey.

Any similar situation will also always not be in Boyce-Codd Normal form because there will be some non-trivial functional dependency whose left-hand-side is not a superkey. Any table like this will essentially have two entity sets each of whom have some attributes. Basically, it will have a left entity set and a right entity set. The left entity set will have some attributes which uniquely identify each left entity and the right entity set will have some attributes which uniquely identify each right entity. Those will be involved in functional dependencies. However, they will each not be candidate keys because you'll have to combine them to get a candidate key for the whole table. As such, they won't be superkeys and there will be a Boyce-Codd Normal Form violation. So BCNF will stop it cold. Anything less than that will only catch some cases. Really, if you only remember one normal form, it should be BCNF.

Database Normalization – Foreign Key vs Varchar Field

The design choices you describe are not directly related to normalization.

I agree there should be a lookup table.

I think an OrderStatusID value would increase redundancy. The status (text) value presumably already satisfies many of the qualities of a good key: unique, stable, narrow, familiar to users, etc. Referential integrity can be applied to VARCHAR columns, of course! Each application that uses the key can assign it a enum as required and would be responsible for mapping enum values to status (text) values. This would presumably make the lookup table a single column, 'all-key' table (and therefore would satisfy 6NF, the highest normal form ;)

[If OrderStatusID is an attribute in the Order table then it would not be in 6NF but, as I say, I don't think you are actually asking about normalization at all.]

Best Answer

Related Solutions

Can a 2NF relation be M:M

Database Normalization – Foreign Key vs Varchar Field

Related Question