Normalizing data with multiple products to a single ID

database-designnormalization

SALESORDER

salesorderline

How can I normalize this data to remove the redundant tuples in the salesorderline database? I already normalized the first one, but cant seem to figure out the second. Also, are my databases already in third normal form? I believe they are, however, would like extra confirmation from most knowledgeable people.

Best Answer

There is no redundancy. You have the FK back to the sales order, how many items were ordered, the per item charge and the total for the line. You are, however, missing something rather important. What item was ordered? The top line show an order of 10 somethings and the next line shows an order for 5 something elses.

Another feature you may want to add is a way to uniquely identify each order detail. There first three items are details for order number 1111. Beyond that, there is no way to differentiate one line from another. That is to say, there in no natural key.

Such detail tables create a differentiator in several ways. One way is to add an incremental value on each line. This can also be done in a couple of ways.

  1. Give the first line inserted for each order a value of one and increment each additional line by one. The disadvantage of this is, when another line must be added later, the highest value so far must be read in to see where to restart incrementing.
  2. Implement a global incremental value. This has the advantage it can usually be automatically generated by the system as a new record is inserted. Of course, line 1 of order 1111 may be given the value 1, line 1 of the order 1112 may be given the value 2, line 2 of order 1111 the value 3, line 1 of order 1113 the value 4 and so on. Essentially, the line value would be a semi-random value. This is fine as these serve as differentiators only. They are only required to be distinct, not grouped in any logical manner. When using this method, use the composite (SO_ID, Increment) as the key even though Increment will be unique on its own. The indexing will improve queries.

Another way is to use the missing field, the item part number or SKU or whatever. The advantage over a generated value is

  1. an additional increment field need not be created.
  2. you can't accidentally enter the same part number more than once. The disadvantage is you can't deliberately enter the same part number more than once -- or instance, if the customer later orders more of the same item. You would have to update the existing line for that item with the new quantity. However, this may be exactly the behavior you want.