How to link a customer to a country when the city/state isn’t known

database-designforeign key

We have a situation where medical systems are sold to hospital groups, but at the time of the order, we don't always know into which state and city the system will be installed. We always know the country, as hospital groups never span countries.

We want to be able to save as much information as we know, so if we know the city, state and country, we want to save all three. If we only know state and country, we want to save those two. If we only know the country, we want to save just that.

Now, I can see three ways of doing this, and was hoping someone would be able to comment as to which is preferable…

1) Have three tables, Countries, States and Cities, each of which is linked from the previous, so any city will have a StateID, and any state will have a CountryID. That way, if we know the city, we can work up the links and find out the state and country. This is neat, and preserves data integrity, but has the disadvantage that if we don't know the city, we can't save it.

One way we thought of to get around this was for each country to contain a placeholder state, that wouldn't be shown in the UI, and for each state to hold one placeholder city. That way, if we only know the country, the placeholder state for that country would be saved, as would the placeholder city for the state. This neatly gets around the problem, but requires placeholder states and cities to be inserted. This isn't a performance issue, as there wouldn't be enough of them to be a problem, but feels a bit like a hack.

2) Another approach is to have a single self-referencing Locations table, that contains a LocationTypID that identifies if the location is a country, state or city, as well as a ParentLocationID, which (if non-null) would refer to the containing location (ie states contain cities and countries contain states). That way, we can save the most granular location that we know.

The disadvantage of this is that the querying would be messy. We are using an ORM, which would work very well with the first approach, as you can choose a country, then navigate to the Regions property to see the associated regions, then for any region, use the Cities property. Approach #2 would require a manual query every time you wanted to get a different entity.

This approach is very flexible, but is probably more flexible than we need, and probably not worth the extra effort required to implement it.

3) The final suggestion was for the Systems table to have a CountryID, a StateID and CityID, and populate whichever one(s) is/are known. This is simple, and doesn't require any placeholder entries, but has one huge disadvantage (in my non-expert opinion), and that is that it seems to involve repeated data which can easily become invalid.

For example, suppose CountryA contains StateA, which in turn contains CityA, whereas CountryB contains StateB, which in turn contains CityB – imagine the same with Cs. There is little to stop someone mistakenly setting a system to be in CountryA, StateB and CityC, even though CityC is not in StateB, and StateB is not in CountryA.

Having played around with this a bit, my feeling is strongly towards the first approach, despite the requirement for placeholder rows, but I would like to hear what the experts have to say.

Anyone any comments?

Best Answer

Use a variation of number 2 - a location table with a involuted (self-referencing)relationship, but add something to make the hierarchy management easier. For example, use nested sets/visitation numbers (see here) or an adjacency list.

The advantages of this approach are that:

  • You have a single foreign key from your transaction to your location table. This keeps your queries simple insofar as you don't need multiple joins and either/or logic.
  • The foreign key can point to as vague or as specific a location as you need.
  • You can easily traverse the hierarchy (since you'd be using a technique like nested sets) to get the level of geography of interest.
  • You can accommodate countries that have more or less than one geographical division between the country and the city, without having to create dummy entries or duplicating data.

I don't know what your ORM is, but it should be able to handle a self-referencing relationship and a partitioning attribute (i.e. GeographyLevel = { country, state, city })