Need Help With BCNF Decomposition Problem

database-designnormalizationoptimizationrelations

I'll try and keep it as quick and concise as I can: I'm currently a novice developer who's super eager to learn Database Normalization and Optimization so I can up my skill in Full-Stack development, so bare with me please. I'm trying to get good at spotting the different kinds of functional dependencies so I can get better at making optimal/appropriate relational models/schemas, and I'm currently doing an assignment on 3NF and BCNF decomposition (possibly NF4?) that I'm struggling with.

The Assignment has 2 parts and I'm fairly certain that I've properly completed the first Part 1, but I'm completely confused on the Part 2.

Here is the relation we are normalizing called LOTS:
Prompt for Relation called LOTS
Diagram of LOTS relation

Part One was to break the Relation down into BCNF form if possible, and here is my work for that:
My Decomposed Relations of original LOTS in BCNF

The current R1 relation is the LOTS relation for Part 2.

Here is the prompt for Part2:
PART 2

It seems to me that, since the AREA attribute has unique sets of values that can determine the COUNTY_NAME that the new introduces functional dependency is AREA -> COUNTY_NAME, which would leave the LOTS relation in 3NF. Thank you for anyone whiling to help a new comer learn. Cheers!

Best Answer

You solved your assignment, so not much more to add. However, because you mentioned that you are

"super eager to learn database normalization and optimization so I can up my skill in full-stack development,"

and because I am mostly stuck indoors these days -- I will demonstrate a bit different method to achieve the goal. This method will serve you well for your personal and business projects, not so much for db-related exams and textbook assignments.

So here is the task, we have relvar (table) with attributes, and some mythical creature called (probably) database designer has provided us with functional dependencies.

lots { PROPERTY_ID
     , COUNTY_NAME
     , LOT_NO
     , AREA
     , PRICE
     , TAX_RATE
     }

COUNTY_NAME, LOT_NO -> PROPERTY_ID
PROPERTY_ID -> COUNTY_NAME, LOT_NO
PROPERTY_ID -> COUNTY_NAME, LOT_NO, AREA, PRICE, TAX_RATE
COUNTY_NAME -> TAX_RATE
AREA -> PRICE

The idea now is to verbalize this and write down (simple) predicates and associated constraints so that the problem can be reasoned about in a natural language using logic.

A simple predicate can not be decomposed without losing information, its matching relvar is in 6NF.


[p 1] County COUNTY_NAME exists.

(c 1.1) County is identified by COUNTY_NAME.

county {COUNTY_NAME}  -- p 1
   KEY {COUNTY_NAME}  -- c 1.1

[p 2] Lot number LOT_NO in county COUNTY_NAME exists.

(c 2.1) Lot is identified by COUNTY_NAME, LOT_NO pair.

county_lot {COUNTY_NAME, LOT_NO}  -- p 2
       KEY {COUNTY_NAME, LOT_NO}  -- c 2.1

[p 3] Lot number LOT_NO in county COUNTY_NAME is assigned property number PROPERTY_ID.

(c 3.1) For each lot and county pair, that lot lot and county pair is assigned exactly one property number.

(c 3.2) For each property number, that property number is assigned to exactly one lot and county pair.

property {COUNTY_NAME, LOT_NO, PROPERTY_ID} -- p 3
     KEY {COUNTY_NAME, LOT_NO}              -- c 3.1
     KEY {PROPERTY_ID}                      -- c 3.2 

-- this one can be decomposed to
-- {PROPERTY_ID, COUNTY_NAME} {PROPERTY_ID, LOT_NO}
-- but leaving it as is to simplify.

[p 4] County COUNTY_NAME has tax rate of TAX_RATE.

(c 4.1) For each county, that county has exactly one tax rate; for each tax rate, it is possible that more than one county has that tax rate.

county_tax {COUNTY_NAME, TAX_RATE} -- p 4
       KEY {COUNTY_NAME}           -- c 4.1  

[p 5] Lot area of AREA is priced at PRICE.

(c 5.1) For each lot area, that lot area is priced at exactly one price; for each price, more than one lot area may be priced at that price.

area_price {AREA, PRICE} -- p 5
       KEY {AREA}        -- c 5.1

[p 6] Property PROPERTY_ID has area of AREA.

(c 6.1) Each property is of exactly one area; for each area, it is possible that more than one property has that area.

property_area {PROPERTY_ID, AREA} -- p 6
          KEY {PROPERTY_ID}       -- c 6.1

At this point, all these relvars represent simple predicates and are in 6NF (well p3 in 5NF). Now I can decide to leave them as is (logically ok) or to combine some of them based on keys, but staying in 5NF.

county_tax {COUNTY_NAME, TAX_RATE}  -- p 1,   p 4
       KEY {COUNTY_NAME}            -- c 1.1, c 4.1

area_price {AREA, PRICE}  -- p 5
       KEY {AREA}         -- c 5.1

property_ { COUNTY_NAME   -- p 2, p 3, p 6
          , LOT_NO
          , PROPERTY_ID
          , AREA
          }
      KEY {COUNTY_NAME, LOT_NO}  -- c 2.1, c 3.1
      KEY {PROPERTY_ID}          -- c 3.2, c 6.1

First two are in 6NF, the third one in 5NF.

This method requires understanding of: 1NF, 6NF, 5NF; or 1NF only, as long as redundancy and lossless decomposition join is properly understood. In other words, you can have your DB in 5NF even if you do not know what that means, as long as you can clearly define predicates and constraints.

So how can one know that relvar is 5NF, if one does not know what it means? Simple, it is not abut being in this-and-that NF, but about removing redundancy.

The 5NF is final NF as far as removing redundancy is concerned.
A relvar (table) is in 5NF iff decomposing it would not remove any redundancies.

Or (same meaning) if at least one of these holds:

  • There is no redundancy.
  • There is redundancy, but it can not be removed by decomposition.
  • It is not possible to decompose table without losing information.
  • It is not possible to decompose table at all.

Just make sure that when you reason abut redundancy, you reason about predicate and constraints, not about few rows of sample data.

To recap, instead of decomposing it by an algorithm or following functional dependencies, the method "explodes" the predicate of the initial relvar into set of simple predicates (6NF relvars) and then composes them, making sure to stay in 5NF.
It uses logic and natural language, a huge advantage when one needs to communicate with analysts and business people.