Using “Lookup Table”’s Rows Instead of Columns for Tables

database-designtable

I am currently working on designing a database that will contain information gathered by a form. However, the design of these forms is in constant limbo, with it being highly likely that the information collected may change significantly over the years.

Because of this, it is necessary that, should our team need to add on a new attribute, we can add without disturbing the setup of our schema too heavily.

Such as, consider a table of clients

+----------+-----------+----------+
| ClientID | firstName | lastName |
+----------+-----------+----------+
|        1 | Alice     | Anderson |
|        2 | Bob       | Brown    |
|        3 | Charlie   | Carlson  |
+----------+-----------+----------+

However, in the future, we need a middle name column. This will cause the old problem of either making the column nullable (even if data should always be present from the new form) and casting all records previously made to NULL for that field, or including dummy information for all of the old records.

A coworker of mine suggested, instead, to use a sort of a lookup table that, to me, acts almost as if the rows replace the specific column information. Instead, the client table would look like this:

 +----------+-------------+-------------+
 | ClientID | contentType | contentData |
 +----------+-------------+-------------+
 |        1 | FIRST NAME  | Alice       |
 |        1 | LAST NAME   | Anderson    |
 |        2 | FIRST NAME  | Bob         |
 |        2 | LAST NAME   | Brown       |
 |        3 | FIRST NAME  | Charlie     |
 |        3 | LAST NAME   | Carlson     |
 +----------+-------------+-------------+

and there would be a lookup table that has individual definitions that relate to the content types within groups (most likely just basically tables) like so:

 +--------------+----------------+-------------+----------+
 | definitionID | definitionName | collection  | isActive |
 +--------------+----------------+-------------+----------+
 |            1 | FIRST NAME     | clientInfo  | Y        |
 |            2 | LAST NAME      | clientInfo  | Y        |
 |            3 | PHONE          | contactInfo | Y        |
 |            4 | MAILING ADR    | contactInfo | N        |
 +--------------+----------------+-------------+----------+

Where the clientInfo collection is used within the client table and the contactInfo is used within the contact_information table, and on.

My coworker claims that this would be ideal, since we can simply add rows to the lookup table whenever we introduce a new attribute instead of adding a whole column to a table. In addition, if we ever needed a new attribute with a many-to-one relationship with a table, we could simply have multiple entries, like so:

+----------+-------------+-------------+
| ClientID | contentType | contentData |
+----------+-------------+-------------+
|        1 | FIRST NAME  | Alice       |
|        1 | LAST NAME   | Anderson    |
|        2 | FIRST NAME  | Bob         |
|        2 | LAST NAME   | Brown       |
|        3 | FIRST NAME  | Charlie     |
|        3 | LAST NAME   | Carlson     |
|        4 | FIRST NAME  | Daniel      |
|        4 | LAST NAME   | Dawson      |
|        4 | CHILD ID    | 1           |
|        4 | CHILD ID    | 2           |
+----------+-------------+-------------+

and add a new row to the lookup table like so:

+--------------+----------------+-------------+----------+
| definitionID | definitionName | collection  | isActive |
+--------------+----------------+-------------+----------+
|            1 | FIRST NAME     | clientInfo  | Y        |
|            2 | LAST NAME      | clientInfo  | Y        |
|            3 | PHONE          | contactInfo | Y        |
|            4 | MAILING ADR    | contactInfo | N        |
|            5 | CHILDID        | clientInfo  | Y        |
+--------------+----------------+-------------+----------+

To me, something about this just feels…wrong. It goes against a lot of what I learned in how to design tables and columns. In addition, I fear that it won't be very compatible with software that relies on columns as a structure instead of just rows. However, I can't help but admit that it seems like it would be nice for easily adding new attributes to tables.

My question is: is this common practice, and what are the advantages/disadvantages of it?

Best Answer

It is a common design pattern. You've already outlined the advantages. The disadvantages are:

  • It is tough to enforce any sort of data quality. What if the application has a bug and forgets to store a field?
  • It is tough to query for reporting purposes. What formerly would have been a wide table is now a really awkward SQL statement. BI tools will not support it out of the box and you'll end up having really complex views or ETL processes to untangle it.

I don't see why the problems you gave for the original design are insurmountable. If you need to add a column to an existing table, you analyze the impact that will have. Either it is so important you should populate it retroactively, or it isn't important and you can leave it alone.

If you are looking for a compromise solution, you could have a "flexible" schema stored in each row. For example, PostgreSQL and SQL Server support a JSON column type. Based on the fields in your form, you would stuff that JSON with the relevant data for that record at that time. Using this solution, each individual row is still somewhat legible to a human, but you have the overhead of parsing the JSON to find out what data elements are in it.