Database Design – Thorough Design vs Quick and Dirty Design Comparison

database-designdatabase-recommendation

Take the representation of an Address, here below is a complete and very detailed implementation:

address representation

here instead is a quick implementation (which more or less contains the same fields, imagine all fields in the former are contained also in the second)

address representation quick

If I were to decide which is the closer to a normalized and academically correct one I'd say the first one, however if I were to start a project I would go with the second.

Do you agree with this consideration? And if yes, how does one deal with this fact?

start with an easy database and as soon as it is time improve it into a more normalized/academical database.
start with something as close as possible to the academical database
stick with the quick and dirty solution

Best Answer

Trying to normalize addresses is generally a bad idea. There isn't a lot of value to normalizing addresses. Both of your designs are inappropriate for the vast majority of systems.

There are two things you typically do with addresses:

Use them to send mail or packages to that location.
Use them to do geospatial analysis on that location.

Since you are using states, provinces, and boroughs in your design, and not prefectures, for example, I'm assuming that you are working in a North American context. If that's true, then you have well established postal authorities (USPS, CPC) with very well regulated postal data and readily available address data quality tools. Even if you are working outside of US/Canada, there are probably data quality tools that will do what you need.

With validation and standardization of your address data, you can make sure that you are able to meet your first goal.

Using ZIP+4 in the US and Postal Code in many other countries, you can get everything you need for your second goal.

A lot of people are really tempted to break addresses down into granular fields. This is a reaction to how bad address data typically is when all you have is "address_line_1, address_line_2,...". However, breaking out lousy, unvalidated city names into its own field only mean you've got a smaller pile of garbage instead of a larger pile. The only way to solve this is to use an address data quality tool to validate and standardize your addresses. If you attempt to normalize your address data you end up with a big pile of many-to-many associations. This is because addresses in real life don't fit into the neat hierarchies that you would see in a textbook.

Unless you have some really specialized need for addresses, just keep your tables simple (a few address lines, with maybe the postal code broken out) and get a good address data quality tool to scrub the data on the way in.

Related Solutions

Database design advice

                                                  +------------ --- ---+
                                                  | Assembly options   |
+------------+------------+----------+------------+---+---+---+ --- +--+
| assembly ▼ | unit cost  | quantity | total cost | 1 | 2 | 3 |     |50|
+------------+------------+----------+------------+---+---+---+ --- +--+
| VSD55      | £10'000    | 2        | £20'000    | 1 | 1 |   |     |  |

If somebody handed that quote to me, my first question would be "What's option 1 for the VSD55?" The answer would be "I don't know." That information isn't on the quote. In the unlikely event that person got to field a second question, that question would be "What does it cost?" Again, the answer would be "I don't know." A very disturbing silence would follow immediately, during which the person who handed me the quote would imagine how much better it might feel to be run over by a train.

Options must be line items on the quote, along with their unit price, quantity, and total price. Options must be named, not numbered. They should appear directly under their parent assembly, too, not scattered all over hell and half of Georgia.

If you want a shot at my money, you'd better make it crystal clear what I'm supposed to be getting for my money.

There's nothing (much) wrong with 50 check boxes on a user-interface form. That makes it easy to pick options. But the UI code should read the checkboxes and insert the right information into normalized tables.

To normalize or not to normalize? (10 fields)

If the fields are always going to have a value and are always read as a group, then option 1 is normalized.

The column names could use some work, but I assume they have been obfuscated.

There is a third option which is to split the two types out into separate tables. This would be appropriate if either type grouping was optional and appear separately.

Option 2 is appropriate where the fields are independent of the id (Primary key). From your description, this is not the case.

Even if you do model the two types as separate tables in the design model, one of the implementations of subtypes is to incorporate them in the main table using nullable columns.

Best Answer

Related Solutions

Database design advice

To normalize or not to normalize? (10 fields)

Related Question