Mysql – How to store multiple data types for same column (for many columns)

eavMySQL

To preface, similar questions have been asked, but I have not found any that address a table that has a large number of columns affected by this issue. All the similar seem to address a table that has a small number of columns affected by this issue…

…but I have a MySQL database table that has many (e.g., 30) columns currently setup as the data type decimal(5,2).

Now, a new scenario has arisen where I need to enter a decimal(5,2) OR a char(2) (i.e., one or the other, but never both) for each of those 30 columns. To give you context, I am entering the quantity of elements detected in a sample and, if the quantity of an element is not detected, then I must be able to identify that an element was tested, but not detected, such as "ND". I also must know if the element was not yet tested. For example, if the column is DEFAULT NULL, then it may indicate that I have not yet tested the respective element. So, there are three possible situations related to this scenario:

  1. A decimal(5,2) value indicates that the element was tested and detected.
  2. A char(2) value indicates that the element was tested and not detected.
  3. A NULL(…?) value indicates that the element was not tested.

So, my initial thought was to change the change columns' type to a varchar, in order for the columns to be able to store either a char(2) or decimal(5,2) value, but that seems be a dangerous path called Entity-Attribute_Value, which may require me to build the application layer more to accommodate the limitations caused by it.

So, I read chapter 6 of the book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming" (http://rads.stackoverflow.com/amzn/click/1934356557) and several other related posts, and the two most common answers seem to be:

1. Use a design that is based on the type of query you want to perform.

In my case, I want to perform multiple types of queries that compare column values of different rows, determine averages, and possibly unforeseen more.

2. Use the Class Table Inheritance design.

I would like to implement the Class Table Inheritance design, but that would require many more tables to be created for each column. It seems like an overly complex solution, given the many columns that need only one of two types of data.

For example, if my understanding about Class Table Inheritance is correct, then for each of the 30 columns in my base table, I must create a single table for each subtype and make their primary key also serve as a foreign key to the base table, which means that I must create 60 tables. It seems excessive to create 60 tables in order to deal with 30 columns that might have one of two data types. Also, it seems that I would need to figure a way to manage those 30 pairs of tables so they refer to the same column, possibly in the application layer.

One possible solution that comes to mind before I end this post is: Use the Class Table Inheritance design, but instead of creating 30 pairs of tables for my two data types, create just 2 tables for the 2 data types and add a column that stores a value that relates each pair, such as the name of the base table column that they would relate to. But this does not seem like good practice either.

Any suggestions?

Best Answer

I'm the author of the book you referenced. Thanks for reading my book!

I suggest you read chapter 14, "Fear of the Unknown."

For your case:

I am entering the quantity of elements detected in a sample and, if the quantity of an element is not detected, then I must be able to identify that an element was tested, but not detected, such as "ND".

I would not store the string value "ND". I would store NULL instead of a number when the value is unknown or inapplicable.

Then when you query it, you can use:

SELECT COALESCE(myNumericColumn1, 'ND') AS myNumericColumn1
FROM Mytable
...;

The COALESCE() function in SQL returns its first argument that is not NULL. In this example, if the numeric is NULL on a given row, the result will use the default value in the second argument, 'ND'.

You can also create a VIEW with this expression for each column, so you don't have to repeat it in every query.


Re your comment:

This is starting to sound like you need to store the results in rows, not _columns. One numeric result per row. Just two three columns:

  • Sample ID
  • Element ID
  • Numeric value

Then you can make a distinction between three states:

  • Presence of a row with a non-NULL numeric value.
  • Presence of a row, but NULL for the numeric value, indicating the test was done but the result was Not Detected.
  • Absence of a row, indicating the element test was not done.

When you want your client to display the results for a sample, you would fetch 30 rows (or fewer if not all elements were tested in a given sample), then write code in your client to loop over the 30 rows, and present the data however you want.