Database Design – Column for Every Data Type vs. One Column That Stores Everything

database-designeav

I have a database table that currently looks like this:

PageField
-Id (Int, PK)
-FieldType (String) Can be Text, Decimal, Integer or Bit
-Value (This stores the value regardless of the FieldType, so it is not strongly typed)

Should I remove the Value column and replace it with 4 separate columns TextValue, DecimalValue, IntegerValue, and BitValue?

The data type would be set to what is relevant. This would mean in every row 3 of these 4 columns would be NULL.

Best Answer

This question (in various guises) crops up regularly. This type of "solution" is known as EAV (Entity-Attribute-Value) and is not a good idea. Take a look here or here for tips (or links to tips) on the problems it can cause and how to properly leverage the data types that your RDBMS offers.

Not using the correct data type is a sure fire way to confuse the query optimiser and slow your system down. Joe Celko is a world renowned expert in all matters SQL and his opinions merit attention.

The fact that he uses the acronym "MUCK" (*) to describe this system should tell you all you need to know :-)

(*) Massively Unified Code Key

Related Solutions

Oracle Database Design – Table Design for Multi-Type Data

Not sure why one of the answers made such bold statement . Regardless: everything can be modeled in relational model

create table entry 
( 
  id number(12) primary key, 
  name varchar2(256) not null unique,  
  customer_id number not null references customer(id)
);

create table datatypes
(
 id number primary key,
 name varchar2(256) not null unique
);
create table entry_data ( 
  id number(12) primary key, 
  datatype_id  number not null references datatypes(id),
  value     varchar2(256),  --or CLOB ,no index
  value_geometry geom  ,    -- no index
  --value_blob   blob -- if you neeed, no index
  entry_id number not null references entry(id),
);

your task is related to the multi-valued attributes task. Datatypes table will abstract your datatypes, however 2 value** columns are created in entry_data due to the fact that your data can be different in nature(belong to different Classes), thus would need different storage. But as far as varchar2 vs. number vs.date -- all can be placed into varchar2 , just would need later on some casting. And with very simplistic logic of NVL in code you can determine which column was populated out of value or value_geometry.

Field types database design best practice

Not discounting @gbn's suggestion (in a comment on the question) to possibly store this data somewhere other than in a RDBMS, I will say that if you do decide to go the RDBMS route, you are better off using a single table with a column per each "type" such that they can be strongly-typed. Or, you could use a single table with a single string field as all data can be converted/serialized to a string, but then you have to be careful to do the conversion properly in both directions for all operations, and that might prove problematic if some uses of that table either incorrectly convert or forget to convert altogether.

The reason for choosing a single table is mostly logistics: it is far easier to dynamically select a different column based on a COALESCE or CASE (or CASE-like) clause than it is to dynamically pull from different tables in a query based on the TypeID of the row (at least not without having to LEFT JOIN all "type" tables in all queries (yikes!).

Depending on which RDBMS you are using, there might even be vendor-specific features to assist in this model. For example, SQL Server, starting in version 2008, I believe, has a SPARSE option that can be added to NULLable columns. This allows for a NULL value to take up 0 bytes whereas normally, for fixed-length fields (i.e. pretty much all of them that aren't [N]VAR{something} or {something} VARYING or XML or generically text or SQL_VARIANT), they would always take up their specified number of bytes. The downside is that for non-NULL values, they now take up 4 extra bytes per row. But it can be a huge savings when the column is 75% or more NULL across all rows. This feature was designed specifically to address this type of model.

Just make sure to have a CHECK CONSTRAINT to ensure that for all of the value_ fields, either none, or only one, of them is NOT NULL.

Then you can access via something like:

COALESCE(value_int, value_string, value_datetime, etc.)

And again, if you are using SQL Server (and maybe one or more other vendors have this feature), you can make use of "filtered indexes" to index each specific value_ field where that value is not null:

CREATE NONCLUSTERED INDEX [IX_FieldValues_ValueInt]
  ON dbo.[FieldValues] ([ValueInt])
  WHERE [ValueInt] IS NOT NULL;

This will allow you to target rows that do have values, although probably not when using the generic COALESCE access method, but when targeting a specific Value% field.

AND, looking again at the two options, I would actually go with Option 3, which is a single table. In Option 2 it does not seem like there is much purpose in breaking field_type_id out as that data is not really a parent to the field_value data. A field_id can only have a single value, right? So just put field_type_id in the field_value table. And make use of it in the CHECK CONSTRAINT that ensures a single field at most is NOT NULL, to ensure that the one field, if any, that is NOT NULL matches the field_type_id.

Best Answer

Related Solutions

Oracle Database Design – Table Design for Multi-Type Data

Field types database design best practice

Related Question