Database Schema for an address book with different mailing lists

normalization

I'm helping a friend out with taking an old excel, single-page database which contains around 12 different columns that store information about:

  • person first+last name
  • email(s)
  • current address
  • phone(s)
  • sorting code
  • receives mailing list
  • other information relating to their church and first cursillo attended

I started to normalize this into different tables.

person | address | email | phone | mailing | church | cursillo

I'm sure there's probably a few more tables I can get, the current database has a lot of missing data and some fields contain multiple email accounts or multiple phone numbers.

My friend doesn't have the time to learn about different databases or to learn programming so I want to make everything as simple as possible for him. For the new website, we're looking at setting it up in WordPress; the previous one was built by him in iWeb.

Ideally we want to be able to easily extract different mailing lists like:

Send e-mail to everyone who attended cursillo 50  
  OR  
Send e-mail to everyone who wants the monthly mail (mailing is just yes/no field option linked to person id)  
  OR  
Send e-mail to the new cursillo team for the upcoming weekend  
  OR  
Send e-mail to people who attend a specific church  

etc.

However after splitting the tables up, I find that I'm not sure how easy it will be to maintain. What if my friend wants to change John Smiths' phone number, he'll have to go into person table and find the id for John Smith then go into the phone table to change the number.

What should I do to minimize the effort required to maintain such a database? as in, should I really normalize the database to 3NF/BCNF?

Best Answer

I would leave the data in a single table. Actually, the data is normalized now in the sense that you have a single Person table. That person is identified by a name, and the remaining columns are data about that Person. Each column depends on the name and varies with the name, which means it is already in 3NF. If you were building a large application, say a corporate application, then it might make sense to create tables for email, phone, address, etc. But since this is just a simple mailing list I would keep it one table. I don't think you have value in having a table for addresses or churches, for example, when neither of these have meaning to you other than as they relate to the current person.

Now if you had do have a lot of data about that address, or about that church, for example, then these "entities" would need to become tables to store the data about each address or church. Say your mailing list has the church name, church address, pastor, and average attendance. Now you have a reason to create a church table. You have columns - address, pastor, average attendence - that depend only on church name, not person name. But if all you have is church name there is really no benefit in this case of creating a church table that has just an id and a name.

One area where you might normalize would be to remove the repeating groups - the email(s) and phones(s) - into child tables. But you could also create multiple columns - say 3 for each - and I'm sure that would be plenty for this application. While not normalized, the only downside in doing this is if you need to store 4 (unlikely, and this is just a mailing list) you are out of luck. It also makes the query more complicated in that if you want to return all the phone numbers you have to select each column instead of joining to the child email table.

Bottom line is that normalization is about removing redundancy. It sounds like you have very little redundancy in the mailing list to begin with and breaking individual columns in the current mailing list into their own tables, if there is are no columns for those tables other than an id and a name, would be overkill.