Today we were discussing the following:
Brazil has 27 states and each one has its own abbreviation (just like USA).
So we've got RJ
for Rio de Janeiro, SP
for São Paulo, MG
for Minas Gerais and so on.
One of our programmers proposed that we should use those abbreviations (RJ
, SP
, MG
, etc.) as PK
on the States
table that we are planning to add to a new project.
Extrapolating the use of our database, I countered his argument saying that if we were to – someday – expand our services to other countries, we would have a problem with repeated abbreviations, for example: in the USA there is SC
for South Carolina and on Brazil we've got SC
for Santa Catarina; the same happens for MT
, PA
and MA
. Counting on this we've agreed that there should be a ID
column as PK IDENTITY
.
Now, supposing that we do not expand our services to other countries and stay only in Brazil I started considering the idea of using a VARCHAR(2) column as PK. In this scenario it doesn't sound like a totally bad idea.
Is it? Why? In which cases this could be applied? Should memory be considered in order to choose from one to another?
Best Answer
In most cases I agree with @Aaron's approach, but this just happens to be one of the few instances of there being a truly natural key to use: ISO codes (ISO 3166 specifically). For those who are unfamiliar with ISO, they are, in their own words (from the main http://www.iso.org/ page):
ISO 3166-1 describes the country codes. While there are a few choices of codes to use (2 character, 3 character, and numeric), the 2 character codes are the recommended choice and the most widely used (including for most country-based top-level domain names).
ISO 3166-2 describes each country's subdivisions (e.g. states). This standard is divided into country-based sections (e.g. ISO 3166-2:BR for Brazil) and the codes are 1, 2, or 3 alphanumeric characters.
So, you could do something like:
The idea here is that you would place the
CountryCode
andCountrySubdvisionCode
fields in any tables that needed the "state" value, and FK back todbo.CountrySubdvision
on(CountryCode, CountrySubdvisionCode)
.While this method, using 3 - 5 bytes depending on the Country / Subdvision, is not as compact as using a 2-byte
SMALLINT
, it does have the advantage of placing human-readable / meaningful values in those related tables. This could easily reduce some number of JOINs (when you only need the 2-character code anyway) and can (at least slightly) reduce time spent debugging certain problems.Please note that I specified a
_BIN2
(i.e. binary) collation for both of the "Code" fields to assist performance. Even with an extra couple of bytes, that should be nearly as fast as comparing twoSMALLINT
values. The only downside there is that you need to be consistent in adding codes in all upper-case and people need to remember that they need to use all upper-case when filtering on these codes.Please also note that I added the
ON UPDATE CASCADE
clause to the FK ondbo.CountrySubdvision
to handle the off-chance that aCountryCode
is changed by ISO (extremely unlikely). Along those same lines, the FKs created on tables where theCountryCode
andCountrySubdvisionCode
fields have been placed will also need to be set toON UPDATE CASCADE
. This will propagate changes made toCountryCode
indbo.Country
to the tables that have both of the fields. This also will propagate any changes made to theCountrySubdvisionCode
to those related tables. This is also highly unlikely to happen (especially for where most of our systems would be dealing with) though more likely than any changes being made to theCountryCode
.UPDATE
Just to have this stated, if using this method (i.e. ISO codes) to denote "States" / "Provinces" and possibly Countries, then you do technically have the ability to start out handling only "States" for a single country. In this configuration you would simply remove the
dbo.Country
table and theCountryCode
field (and associated FK) fromdbo.CountrySubdvision
. Then you would only haveCountrySubdvisionCode
to place in any related tables.Now, if you later find that you need to expand the system to handle other countries, you can, at that time, do the following steps (none of which change any existing
CountrySubdvisionCode
values in any of the related tables):dbo.Country
tabledbo.Country
tabledbo.CountrySubdvision
dbo.CountrySubdvision
CountryCode
field todbo.CountrySubdvision
, making itNOT NULL
and with aDEFAULT
of the CountryCode for the only country that you have been using so far (thisDEFAULT
can be removed at the end of this process, if you like)dbo.CountrySubdvision
to be on(CountryCode, CountrySubdvisionCode)
dbo.CountrySubdvision
for theCountryCode
field to reference thedbo.Country
tableCountryCode
field to all of the related tables that already have theCountrySubdvisionCode
field, making itNOT NULL
and with aDEFAULT
of the CountryCode for the only country that you have been using so far (thisDEFAULT
can be removed at the end of this process, if you like)dbo.CountrySubdvision
tableCountryCode
fields fromdbo.CountrySubdvision
and the related tables