Best way to store country and city information

database-designperformancequery-performance

I always wondered what is more efficient between two. There is one "main" table where I store all user related informations. Also, there are two more tables, country and city tables. city table contains column country_id so I can relate to which country this city belongs.

Now, the main question : should I store only city_id in user table and query for country name using country_id from city table or to store both, city_id and country_id ?

======

Other thing is related to counting. If have some fourth table, like message, and I store ID of user who send message, ID of user who receive message, text, etc. Now, if I want to know how many messages some user sent, I can query for messages where sender=ID and count rows, but I used another approach. I created column count in user table and whenever user send message, I increase this number. Is this approach good and if no, is the first one efficient since that table can have billions of rows.

Best Answer

Countries are relatively stable so it is easy to have a countries table. City names can be duplicated multiple times within a country. Maintaining the city table may be difficult for a couple of reasons.

  • Some cities have different spellings in different languages.
  • Some countries have multiple cities with the same name. These are likely in different states or provinces.

You may want to allow free format for the city and keep it in the user table. Otherwise, investigate adding a state/province table. You will still need to deal with how to add new cities as needed.

If you index the sender, the count can be done on the index. This can be extremely efficient for smaller counts. Updating the user table for each message will increase the log data volume, and increase the cost of saving each message. It is also error prone, and can lead to deadlocks if you are not consistent in the order the accesses are done. De-normalizing the model should be done only after determining you have a performance problem.

A query like SELECT COUNT(1) FROM message WHERE sender=?ID is likely better than maintaining a count.