UNF to 3NF is this relation workable

database-designnormalization

I've inherited a flat table of data consisting of around 2 million records. Looking at the data its already in 1NF, i'd like to get this into 3NF.

The data is Australian so suburb can't determine state as suburb names exist in more than one state, plus one postcode often covers more than one suburb. Meaning suburb, state, postcode should be a super key?

There are multiple companies at different sites, some have the same phone number (freephone 1300 or 1800) and others have distinct phone numbers, every record has a fax number and most have a phone number. ContactName is mostly blank so phone and fax numbers relate to companies not contacts.

Selection of records by users is typically RLIKE on the field BusinessTypeDescription coupled with geographic criteria which of course means a full table scan on every query and therefore super slow results.

Is the proposed table structure below in 3NF and is it practical?

EXISTING TABLE
-----------------------
ID (PK)
CompanyName
Address
Suburb
State
Region
Postcode
Country
ContactName (this field is mostly empty)
Phone
Fax
Mobile
Email  (this field is mostly empty)
Website  (this field is mostly empty)
PremiseType
BusinessTypeDescription
ANZSICCode
ANZSICDescription
RecordSource

This is the proposed structure…

COMPANY
---------------
ID (PK)
CompanyName
Address
LocationID (FK)
PhoneID (FK)
FaxID (FK)
MobileID (FK)
ContactName
PremiseType
BusinessTypeID (FK)
ANZSICCode (FK)
Email
Website
RecordSource


LOCATION        COUNTRY         REGION
--------------  --------------  ------------
ID (PK)         ID (PK)         ID (PK)
Suburb          Country         Region
State
Postcode
RegionID (FK)
CountryID (FK)

PHONES
------------
ID
Phone

MOBILES
------------
ID
Mobile

FAXES
------------
ID
Fax

BUSINESSTYPE
----------------------
ID
BusinessTypeDescription


ANZSICS
----------------------
ANZSICCode
ANZSICDescription

Best Answer

You need to learn the difference between repetition and redundancy. Sometimes a field value is repeated coincidentally. This kind of repetition cannot be normalized out.

Normalization is about studying the functional dependencies between key and non-key elements. It is not about putting everything that might occur twice into a table with a surrogate key.

For example, you say that a phone number may be used by multiple companies and have modeled phone number as an independent table. This would prevent update anomalies if when one company changed its number all of the other companies using that number changed theirs at the same time and in the same way. Does any of that actually make business sense? It doesn't to me.

Also, normalizing geography into a hierarchy is a questionable proposition. However, you haven't even normalized into a hierarchy. Is region ID not determined by country ID? If so, your LOCATION table is not 3NF. Similarly, are there postcodes that bridge states? I don't know the Australian system well enough, but I know that in Canada, the US and the UK this doesn't happen.

You've designed everything like its a star schema and company is the fact. Facts in star schemas are usually transactions, not static entities. Transactions don't tend to change their properties, because they usually represent something that actually happened and one doesn't generally go back in time to change history. Static entities live for a long time in your data (maybe forever) and change their properties fairly often. Star schema is not a good, efficient model for this type of data.

What you should do is run queries on the frequency distributions of each column and on combinations of columns that you think may be keys combined with columns that you know aren't keys. This will help you test your assumptions about which columns are truly able to act as keys. Then you should temper those findings with some common sense around what could potentially happen to your data in terms of changes to column values. Once you know your actual functional dependencies found in your data, then follow the relatively mechanical process of moving your relation to 3NF.