One to at most one relation or multiple null columns

database-design

I have a table with many attributes that can be grouped in logical groups, and thus the idea of putting these attributes in separate tables seems attractive. The logical groups don't represent entities themselves. When one attribute in the group is set, most of the rest of the attributes in that group also must be set (but not all). The fields's type of the groups generally are VARCHAR(15-20). There's no BLOB or TEXT fields either.

The logical groups don't represent subtypes, because they aren't mutually exclusive.

The entity is requested more for reading than for writing. Hence a large table looks appropriate. Also, this way the joins are avoided in queries. The part of this approach I don't like is the large number of nullable fields.

Looking for Expert advice.

Best Answer

There is nothing wrong with nullable columns, even a lot of them, if that is what your data domain calls for.

On the other hand, the fact that you say these columns can be grouped logically implies to me that something else might be going on.

If they can be grouped logically because different sets of columns apply to different sets of rows, then you might have an entity-subtype situation.

Conversely, if columns can be grouped because they apply at different times, then you may have a normalization issue. For example, if your columns are something like "January Sales", "February Sales", etc. these should be rows in a child table.

While there is nothing innately wrong with nullable columns, neither is there anything wrong with joining. It's what RDBMS does for a living.

UPDATE:

Given additional information about the logical groups of columns:

There are two kinds of sub-typing that can be represented in a database using 1:1 relationships. If the logical groups were mutually exclusive, then the parent entity could have what is known as a partitioning attribute that tells you which one of the subtypes is applicable. However, without a partitioning attribute, it is possible to have zero, one or even multiple subtypes being applicable at the same time.

The same fundamental question applies then to what you do with this situation.

A good way to resolve it would be to look at the logical groups of columns. Are the columns in logical group A the same as in logical group B - or are the totally different? If they are different they might be best modeled in the single table with nullable fields. If they are the same, then this might be a clue that they should be multiple child rows instead.

Another clue to look at is whether it makes sense that a logical group of columns could take on a life of its own and start attracting relationships from other tables. If logical group B might sometime soon find itself with multiple child records from another table, then it might be a sign that it makes sense to promote that group to its own subtype style table.

One last thing to consider is physical implementation. If a logical subgroup is very sparsely populated, you might be able to make a case for segregating these columns into another table to optimize physical storage. This step shouldn't be done proactively. This kind of optimization should be done when performance testing proves it is necessary.

If none of these things are true, then you are probably best off leaving the nullable columns in the original table.