Entity Relationship Design; The best scenario

database-designerdnormalization

TLDR; What would be the best design choice for the scenario below and how would each design react at extremely large data volume?

I have experience working with large government database systems servicing 24/7 data collection, processing and reporting. It was fun to poke around all the differently designed schemas and get an idea of how all these features designed by all these different people somehow mishmashed into a working solution. Like I'm sure a few of you know, fun poking times is something of a luxury and most cycles were spent pragmatically, keeping the system alive rather than improving the design.

I have been modelling a new database system and would like to get some thoughts on how you would go about relating these entities.


Scenario: Clients, employers and practitioners require phone numbers.

We have four entities:

  • CLIENT
  • EMPLOYER
  • PRACTITIONER
  • PHONE

Our initial entities

Phone numbers are classified by the technology used to receive calls, technology used informs data format constraints.

  • Landline
  • Fax
  • Mobile

A description field indicates the primary usage of the phone; Home, work, etc.


Ways to relate these entities

1. Denormalize PHONE into CLIENT, EMPLOYER, PRACTITIONER

Let's start with bad design and go from there. De-normalize all the phone numbers!

De-normalized tables

  • 3 entities, no relationships
  • Must create new columns if a new phone number is required; Leading to CLIENT.Phone1Number, EMPLOYER.Phone2Type, PRACTITIONER.Phone3Description;
  • or limit the number of phone numbers that can be entered for each entity
  • Redundancy, lots of overhead
  • Doing bulk-updates or maintenance would become tedious

Conclusion: select * from 'no_thanks';


2. Build the nth Bridge; Rename the database Pittsburgh

For each entity that can have a phone number related, create a bridge entity to bring them together.

Bridge Tables for Days

  • 7 entities, 6 relationships
  • Data redundancy; PhoneNumber populated in multiple locations
  • Can use bridge table if only PhoneNumber is required; Does not require JOIN to PHONE
  • If a new entity, FAMILYMEMBER, requires a phone number, a new bridging table must be created

Conclusion: Maybe, depends on the difficulty of maintaining the relationships


3. A Quasi "Star Schema"

Modify PHONE to include Foreign Keys of CLIENT.ClientID, EMPLOYER.EmployerID, PRACTITIONER.PractitionerID.

Another dimension

  • 4 entities, 3 relationships
  • If one of these keys is populated, the others cannot be populated. (No case when a client, employer, practitioner would have the same phone number)
  • If a new entity, FAMILYMEMBER, requires a phone number, a new attribute must be created in PHONE
  • All FK attributes with NULL by default

Conclusion: Seems like the strongest design in my opinion

Best Answer

If I really have to choose, I'd say your 2nd choice (build nth bridge) will be my favorite as it is clean and stick to rules. While your 3rd choice, though it reduces the # of tables, it actually brings "redundant" columns with null values in the Phone table and this design is "volatile" in the sense that if you need to add a new phone owner type, you need to add an additional column in the [Phone] table.

In an alternative way, I would create a "bridge" table like the following (using t-sql as an example):

    create table PhoneOwner (ID int identity primary key
    , OwnerID int 
    , OwnerType varchar(20));

So for the [Phone] table, I will add one column PhoneOwnerID which references to [PhoneOwner] table's [ID] column. OwnerType column can have values as "Client", "Employer" or "Practitioner" in this case, or other values in future. (Of course, we can create [OwnerType] table to hold these values, and put a FK in [PhoneOwner] table to reference this [OwnerType] table)

The problem with this alternative design here is the integrity check, for example if OwnerType = 'Client', we need to ensure OwnerID's value does exists in [Client] table, though this can be done via application layer or via trigger to implement such business rules.

But I have to admit this is a compromise to strict integrity (via FKs) rule, the good side is that it somehow achieves better flexibility, and less mess in table design which will cause many null values in "redundant" columns.