Normalizing data past initial db design phase

database-designdenormalizationnormalization

I wanted to poke at everyones brain to better understand when it makes sense to normalize a table that carries redundant data. I recently worked on a project at my startup and really question myself on whether it was a good idea. So the problem was something like this.

Our web app is a e-commerce web app. Our db contains a table that looks like this:

Store Product Table
-----------------------
- product_name
- price 
- store_product_id
- coupon_name

The data looks something like this:

pk, product_name, price , store_product_id, coupon_name
1, 'Orange Juice', 10.0, 2, 'ORANGE15PERCENT',
2, 'Orange Juice', 10.0, 2, 'ORANGE50PERCENT',
3, 'Orange Juice', 10.0, 2, 'ORANGE70PERCENT'
4, 'Milk', 9.0, 89, 'MILK20PERCENT',
5, 'Milk', 9.0, 89, 'MILK50PERCENT'
6, 'Ham', 5.0, 45, '', 
...
300, 'Jacket, 100.0, 90, 'JACK50PERCENT',

This is a relatively small table (~200 records). As you can tell there is a ton of redundancy. I saw redundancy and broke this out into two tables a Product table and Coupon table. This eliminated all the redundancy. I felt proud at first, but I wonder if it was really necessary. Should one always remove redundancy in a tables and break apart as two tables? Please note that this decision was made way past the initial db design phase. It's probably an easy decision to make when you are first designing things, but what about when you've built a codebase around a redundant table?

Note: I realize that there are pros to having redundant data if your table is a read heavy table, since this is a small table I don't think thats super relevant in this situation. Open to be proved wrong though.

Best Answer

Normalization exists to remove update anomalies. If a single fact changes in the real world, a single column of a single row should change in the model. In this way one source of inconsistency is eliminated. Pragmatically, normalization can be eased if

  • there are no updates
  • all the developers understand the situation and are competent to deal with it
  • there are good QA processes in place
  • inconsistencies are of little cost and easily fixed

.. amongst other considerations.

So, should you have normalized this particular table? In best DBA tradition I'll say "it depends."

Likely having a denormalized table causes problems right now. It may be the code is more complex, or silly errors slip in, or queries run slower or .. whatever. But there is a price and you know what it is. There is a risk that future events due to having a denormalized schema also will incur some cost. Maybe the wrong coupon is applied or too many rows get updated and revenue is lost. You know your business; you know what may go wrong and its severity.

Changing the schema and application will have a cost - direct effort and opportunity cost on features not implemented, releases delayed etc.

I'd say, if the discounted future cost of the current situation is greater than the implementation costs then do it. If not, add this item to the pile labeled "technical debt", monitor the situation, and get on with other stuff.