A column value depends on the values of two other columns, of which exactly one is nullable

database-designnormalizationrdbms

In the Person table, either the Citizen_IdentityID or the Foreigner_WorkPermitID will contain NULL marks.

The PersonID_ID holds either the value of the Citizen_IdentityID or Foreigner_WorkPermitID (whichever that does not retain a NULL mark).

In other words, PersonID_ID is dependent on the value of either Citizen_IdentityID or Foreigner_WorkPermitID, and there are other attributes (eg. IDColor) that depends only on PersonnID_ID.

How should I structure the design for normalization?

The above entities are just a simple substitution of the actual use case.

Best Answer

If I'm understanding your description, PersonID_ID is a duplicate of Citizen_IdentityID or Foreigner_WorkPermitID (whichever is not NULL). If this is true, I'd suggest something similar to what DCook has suggested:

  • remove Citizen_IdentityID and Foreigner_WorkPermitID (eliminate duplicate data)
  • add a Person_Type (char(1) not NULL) column; current values could be 'C' or 'F' for Citizen/Foreigner, respectively

Depending on where your data is coming from, this design would allow for the inclusion of future person types (eg, in the US you have citizens, visa holders, green card holders; in some countries there are clear differences in citizen vs resident vs visitor vs non-visitor visa - not to mention different types of visas). Obviously the need for additional person types will depend on what you're trying to model.


Looking at this a bit more ...

Can an individual's type/status change over time (eg, foreigner becomes citizen; citizen relinquishes citizenship and becomes a foreigner)?

The reason I ask is that you'd obviously need to change the value of the Person_Type and PersonID_ID columns ... probably not a big deal ... but based on your model it appears you'll be using PersonID_ID as part of the PK in at least one other table, so that table's PK would also need to be updated ... while certainly doable, this could quickly get messy depending on the number of tables and records that would need their PK modified, plus the hassles of working around RI constraints. The mess gets a little nastier if you also need to maintain some sort of history/audit-record for a person (eg, having to store/map the changing of PKs ... yuck!).

So now I'm wondering if PersonID_ID should be a separate, truly unique value, independent of the Person_Type? and if so, then you'd probably need to retain the Citizen_IdentityID/Foreigner_WorkPermitID value, but perhaps give it a more generic name.

So, that would change my suggestion to something like:

  • add a Person_Type (char(1) not NULL) column; current values could be 'C' or 'F' for Citizen/Foreigner, respectively
  • change PersonID_ID to a be it's own unique value (could be done through something like an identity column or a unique-PK generator); this value would never change, even if Person_Type were to change over time
  • replace the Citizen_IdentityID/Foreigner_WorkPermitID columns with a single, generic ID that serves as a FK to the Citizen/Foreigner tables, as appropriate); perhaps something like Person_TypeID (int not NULL)?

OK, one more concern/question ...

Can a person ever be associated with more than 1 Person_Type (C/F)?

For example, if a foreigner goes through the naturalization process to become a citizen, how would this be represented in your model?

Would the person now have both a Foreigner_WorkPermitID and a Citizen_IdentityID, thus requiring the storage of both values in the Person table?

Or would you update the Person table to show a change from 'Foreigner' to 'Citizen' (ie, change Person_Type and Person_TypeID)?