Database Design – Thorough Design vs Quick and Dirty Design Comparison

database-designdatabase-recommendation

Take the representation of an Address, here below is a complete and very detailed implementation:

address representation

here instead is a quick implementation (which more or less contains the same fields, imagine all fields in the former are contained also in the second)

address representation quick

If I were to decide which is the closer to a normalized and academically correct one I'd say the first one, however if I were to start a project I would go with the second.

Do you agree with this consideration? And if yes, how does one deal with this fact?

  1. start with an easy database and as soon as it is time improve it into a more normalized/academical database.
  2. start with something as close as possible to the academical database
  3. stick with the quick and dirty solution

Best Answer

Trying to normalize addresses is generally a bad idea. There isn't a lot of value to normalizing addresses. Both of your designs are inappropriate for the vast majority of systems.

There are two things you typically do with addresses:

  1. Use them to send mail or packages to that location.
  2. Use them to do geospatial analysis on that location.

Since you are using states, provinces, and boroughs in your design, and not prefectures, for example, I'm assuming that you are working in a North American context. If that's true, then you have well established postal authorities (USPS, CPC) with very well regulated postal data and readily available address data quality tools. Even if you are working outside of US/Canada, there are probably data quality tools that will do what you need.

With validation and standardization of your address data, you can make sure that you are able to meet your first goal.

Using ZIP+4 in the US and Postal Code in many other countries, you can get everything you need for your second goal.

A lot of people are really tempted to break addresses down into granular fields. This is a reaction to how bad address data typically is when all you have is "address_line_1, address_line_2,...". However, breaking out lousy, unvalidated city names into its own field only mean you've got a smaller pile of garbage instead of a larger pile. The only way to solve this is to use an address data quality tool to validate and standardize your addresses. If you attempt to normalize your address data you end up with a big pile of many-to-many associations. This is because addresses in real life don't fit into the neat hierarchies that you would see in a textbook.

Unless you have some really specialized need for addresses, just keep your tables simple (a few address lines, with maybe the postal code broken out) and get a good address data quality tool to scrub the data on the way in.