I've been trying to understand for the last few years which way is good for storing addresses. I've been getting "normalize all the way" but also "denormalize as much as you can" and I just cannot get my head on deciding what is good for my project.
Shortly, my project would involve lots of users (100k+) and all users will have 1-3 addresses stored (personal, business and billing). That means that I can have 100k+ * 3 records for addresses. Also, I will be doing a lot of look-ups by zipcodes (get users that have addresses registered into a zipcode).
I will only have U.S addresses.
I am happy with the user-to-address tables and their relationship for my project. However, the tables without relationships is what drives me nuts.
(My tables displayed in the image are like that just for me to get a better view on what I need and how do to it. I know there are a lot of redundant fields so please don't take them as they are.)
Does anyone has any tips on how should this be designed?
Does anyone has a link or something to a schema or a similar schema of what big companies use (UPS, USPS, etc)?
Best Answer
I think that @datagod's answer is good, but I would tweak it a little based on your stated requirements:
Address Table
As you can see my recommendation is very similar to @datagod's. I changed two things:
Country
FK since you stated you only need addresses for the United States.ZipCode
/PostalCode
a FK. I think this will allow you to index/query zip codes more effectively.Furthermore I feel like you don't need to upload an external master list of zip codes unless you want to use that list for data validation purposes... You can check if that zip code exists when the address is inserted and insert it into the zip code table if it doesn't exist. This will add some overhead on insert but I don't think it will be that much since common zip codes will be inserted pretty quickly.
If you are going to move international then I would definitely add the
Country
table as suggested by @datagod.Normalizing the database down to cites/counties/streets etc. seems like overkill to me at this point unless any of the following apply:
I don't have @datagod's experience with millions of address records so my advice might be plain wrong, but it is the approach I would take.
Edit: Two answers are eschewing normalizing the zip code now so I might be overlooking a pain point because I haven't experienced it.