Approach to Analyzing Database Requirements

database-designnormalizationrelational-theory

When presented with requirements for a database driven system, I always find it useful to imagine a world with no computers, where data records were kept in filing cabinets and data was collected by filling in certain forms. Then from that, I proceed to build my table of unnormalized data, then finally get it up to 3NF (if I can).

But the difficulty I often run into in my approach above is, how big should I make my forms? For example, let me use the same scenario from my last question, but extend it further.

An animal charity that aims to house rescued pets to new loving home owners. This charity works slightly differently, it has three types of variations on "Person".

Temporary Homeowner [TH]– Will take care of the pet temporarily until a new permanent homeowner is found.
Permanent Homeowner [PH] – Can view the files of all available pets at the shelter and pick one to adopt permanently.
Charity Sponsor [CS] – Sponsors money to the charity because of the good work they do!

Slightly different information is recorded for each type of person. The [TH] comes and registers himself/herself at the charity to help out, he/she states what type of animal he/she are willing to temporarily look after and how many animals max they're allowed to temporarily house.

The [PH] needs his/her details recorded, so that the charity can give them a call back whenever they're free to see how the poor animal is settling in.

The [CS] also needs his/her details stored in case the charity sends them a newsletter and / or a thank you note for the help and support and informing them how their money is being used to do good.

So in a world before computers, I think in such a case, there would be a "person" registration form, but to highlight my issue, how big should this form be and what information should it capture? Here is what I currently envisage this form to be like (created in word):

enter image description here

I suspect that my approach to getting the unnormalized data into a table would affect how smooth the process of normalization progresses later on.

From the above form, I can bang the whole thing into a table, where each form field would equate to a column of the table, but that would lead to many null values for rows of data, depending on what the person type is [TH][PH] or [CS].

I am looking for some guidance on this, and would greatly appreciate some advice on the best approach to take.

Best Answer

You could have a base table to store the common "person" attributes, and then specialized tables for the more specific fields. Example:

Person
------
  id
  ref_num
  reg_dt
  addrs_line1
  addrs_line2
  postal_code
  phone_num

th_person
--------
  id
  person_id (FK to person.id)
  max_accepted_at_one_time

animals
-------
  id
  accepted_by_th_person (FK to th_person.id)

ph_person
---------
  id
  person_id (FK to person.id)
  animal_ref_num

cs_person
---------
  id
  person_id (FK to person.id)

donations
---------
  id
  date
  amount
  donor_id (FK to cs_person.id)

This might be a good starting point.

Related Solutions

Mysql – How to model inheritance of two tables MySQL

Since I made the diagram, I better answer ;)

Current relational databases unfortunately don't support the inheritance directly, therefore you need to transform it into "plain" tables. There are generally 3 strategies for doing so:

All classes¹ in a single table with NULL-able non-common fields.
Concrete classes² in separate tables. Abstract classes don't have the tables of their own.
All classes in separate tables.

For more on what this actually means and some pros and cons, please see the links provided in my original post, but in a nutshell the (3) should probably be your default unless you have a specific reason for one of the other two. You can represent the (3) in the database simply like this:

CREATE TABLE person (
    person_id int PRIMARY KEY
    -- Other fields...
);

CREATE TABLE civil (
    civil_id int PRIMARY KEY REFERENCES person (person_id)
    -- Other fields...
);

CREATE TABLE worker (
    worker_id int PRIMARY KEY REFERENCES person (person_id)
    -- Other fields...
);

CREATE TABLE event (
    event_id int PRIMARY KEY,
    person_id int REFERENCES person (person_id)
    -- Other fields...
);

Unfortunately, this structure will let you have a person that is neither civil nor worker (i.e. you can instantiate the abstract class), and will also let you create a person that is both civil and worker. There are ways to enforce the former at the database level, and in a DBMS that supports deferred constraints³ even the latter can be enforced in-database, but this is one of the few cases where using the application-level integrity might actually be preferable.

¹ person, civil and worker in this case.

² civil and worker in this case (person is "abstract").

³ Which MySQL doesn't.

Anomalous Updates in Normalized Database

Normalization is the formal process for removing redundancy from relations by taking projections which when joined back form the original relational and thus eliminate some redundancy without data loss. It is the science underlying database design. The first three normal forms, and BCNF, deal specifically with eliminating redundancy due by ensuring that every non-trivial functional dependency is fully dependent only on candidate keys. Higher normal forms deal with other kinds of dependencies to further eliminate redundancies. Even when fully normalized (5NF is generally considered the "final" normal form although there are four others in the literature) redundancy can still remain as not all redundancies can be removed by taking projections.

Another tool to address eliminating redundancy is the principle of orthogonal design which states that two distinct relvars cannot have in them a tuple with the property that if it appears in the first relvar it must also appear in the second and vice versa. But this principle only addresses redundancy across relvars whereas normalization addresses redundancy within them so it doesn't help with your example.

Ultimately Date contends we just need more science to guide database design as that which we have today as you show isn't quite enough. One practical point to your example is that although there is redundancy, at least it can be controlled redundancy if a table is defined to hold the dancers, all key, with name and birth date. Then, name and birth date become a foreign key to the dances table, and that foreign key can be defined to cascade updates. Then, if a particular dancer's birthdate is found to be in error and corrected, the DBMS will automatically handle updating all the places in the dance table where that dancer was listed. Moving the control of the redundancy from the user to the system is a big step forward that you can get with today's SQL DBMS'.

All of this information is paraphrased from Date's excellent book Database Design and Relational Theory which provides a significant amount of thinking and detail around just this issue. It is indeed the case that we stand on the shoulders of giants.

Best Answer

Related Solutions

Mysql – How to model inheritance of two tables MySQL

Anomalous Updates in Normalized Database

Related Question