Normalization for postal addresses in US (address, county, city, state, zipcode)

database-designdenormalizationnormalization

I've been trying to understand for the last few years which way is good for storing addresses. I've been getting "normalize all the way" but also "denormalize as much as you can" and I just cannot get my head on deciding what is good for my project.

Shortly, my project would involve lots of users (100k+) and all users will have 1-3 addresses stored (personal, business and billing). That means that I can have 100k+ * 3 records for addresses. Also, I will be doing a lot of look-ups by zipcodes (get users that have addresses registered into a zipcode).
I will only have U.S addresses.

I am happy with the user-to-address tables and their relationship for my project. However, the tables without relationships is what drives me nuts.

(My tables displayed in the image are like that just for me to get a better view on what I need and how do to it. I know there are a lot of redundant fields so please don't take them as they are.)

Does anyone has any tips on how should this be designed?
Does anyone has a link or something to a schema or a similar schema of what big companies use (UPS, USPS, etc)?

enter image description here

Best Answer

I think that @datagod's answer is good, but I would tweak it a little based on your stated requirements:

Address Table

AddressLine1 varchar(255) -- If using SQL Server I would go with NVARCHAR instead. You don't seem to need unicode support but why not support it since things will often be converted to unicode in the application layer by default anyway, and storage is cheap.
AddressLine2 varchar(255)
City varchar(50)
ZipCodeID int -- FK to PostalCode table
County varchar(50)
State      varchar(50)

As you can see my recommendation is very similar to @datagod's. I changed two things:

  1. I got rid of the Country FK since you stated you only need addresses for the United States.
  2. I made ZipCode/PostalCode a FK. I think this will allow you to index/query zip codes more effectively.

Furthermore I feel like you don't need to upload an external master list of zip codes unless you want to use that list for data validation purposes... You can check if that zip code exists when the address is inserted and insert it into the zip code table if it doesn't exist. This will add some overhead on insert but I don't think it will be that much since common zip codes will be inserted pretty quickly.

If you are going to move international then I would definitely add the Country table as suggested by @datagod.

Normalizing the database down to cites/counties/streets etc. seems like overkill to me at this point unless any of the following apply:

  • You find yourself querying by those data points often and would benefit from the indexes/normalization
  • You have to do some sort of region based security, ie Sales located in Atlanta can't access information outside of these three counties.
  • You want to use those lists as data validation to make sure people aren't giving you bad data. (This seems like it will be a mess to implement though depending on how far you want to validate the data.)
  • Some other reason that I haven't thought of that makes further normalization to make your life easier.

I don't have @datagod's experience with millions of address records so my advice might be plain wrong, but it is the approach I would take.

Edit: Two answers are eschewing normalizing the zip code now so I might be overlooking a pain point because I haven't experienced it.