Design a table that will use some columns just in certain times

database-design

I have a User with one or more saved Documents, but each Document has specific attributes in Brasil for example we have:

  • RG(General Register) have the following attributes:

    1. A number

    2. Expedition Date

    3. Emission Org

  • CPF(Like a Social Security Number) have the following attributes:

    1. A number
  • CNPJ(Like a National Registry of Legal Entities) have the following attributes:

    1. A Name (of a company)

    2. A Number

  • CNH(Driver's License) have the following attributes:

    1. A Name (Of a person)

    2. A Number

    3. Expedition Date

    4. Valid Date

Then I designed the following model:

enter image description here

FK Document Type will control what attributes of table will be null, for example ifFK Document Type appoint to CPF the Valid Date attribute will not be used.

That's the correclty way to design a system with more than one type of document? I think that is not good make one table to each document type.

Best Answer

Instead of storing number of repeating DocType names you store the ids of the DocType descriptions from the reference table. This is a common practice called normalization.

The sparsed table with nullable attributes is good enough while the number of possible attributes isn't too big. Sometimes it's more efficient to store all the attributes in the aux table docid -- attrname -- value containing only not-null values and join them whenever you need.

SELECT ud.*, da.*
  FROM UserDocuments  AS ud
  JOIN DocsAttributes AS da ON da.doc_id = ud.id
 WHERE ud.ID = 12345      -- for certain document only

The only difference is that by join you'll get not the single row but the table like that:

            UserDocuments      |       DocsAttributes 
 ------+----------+------------+--------+----------+-----------
    id | FK_users | FK_DocType | doc_id | attrname |     value
 ------+----------+------------+--------+----------+-----------
 12345 |      456 |          7 |  12345 | RGNumber |       9876
 12345 |      456 |          7 |  12345 |  RGEDate | 2019-05-01
 12345 |      456 |          7 |  12345 |   RGEOrg |       7777
 12345 |      456 |          7 |  12345 |   CPFNum |      88888

This table can be easily parsed on the app side.

An advantage of that approach is that you can add as many attributes as you want. Attributes names can be separated into the reference table for the normalization reasons.

The main disadvantage is that all attributes should be converted into the same type (VARCHAR() for example) and some functionality can be lost - like datetime arithmetics and/or timezone conversions.

Having a number of tables for each type of document isn't a good design.

There is no universal recipe and you have to decide what approach is suitable for you. Sometimes single table with hardcoded list of attributes is preferrable. Sometimes an expandable table of EAVs (entity-attribute-value) fits better.