Database Design – Is Combining Independent Tables Called Normalization?

database-designdenormalizationnormalizationrelational-theoryterminology

So, at workplace, we currently store our audit logs in different tables, depending on what it is e.g. login/access info, configuration changes, etc.

These are all independent data with no foreign key or any relationship whatsoever. Some of the columns are similar, e.g. ID (obviously), datetime, name of user making the change, etc, while others are different.

Recently, I'm asked to combine all the tables into one, where the similar columns will be preserved, while the different columns will be in JSON, stored in a CLOB column.

I'm being told that this process is called "normalization" – we're converting the many tables into "normal forms".

Now, I'm not a database expert, but that doesn't seem to match what I've learned about normalization in my Intro to Database class. Or am I just ignorant?

Best Answer

No, the task you are being asked to fulfill —arranging (a) columns that belong in two or more distinct base tables in (b) a single base table— is not called normalisation (nor normalization for those who write it so).

Regarding the relational model, crated by Dr. E. F. Codd, normalization is a science-based procedure performed by a designer at the logical level of abstraction of a database which involves:

  • decomposing, by means of first normal form, a relation (table) with one or more attributes (columns) configured with nonatomic1 domains (types) so that, having all the relevant domains (types) fixed as simple or atomic, data manipulation and constriction are much easier to declare via the expressive facilities of the data sublanguage (e.g., SQL) provided by the database management system (DBMS for brevity) of use; and

  • getting rid of undesirable dependencies among the attributes (columns) of a relation (table), by virtue of split-ups based on the successive normal forms, in order to avoid update anomalies.

Of course, the designer has to take into account the meaning carried by the relations (tables) and attributes (columns) under consideration. The aforementioned meaning is assigned in terms of a modeling exercise that takes place at the conceptual level of abstraction, depending on the particular informational characteristics and rules of the business environment of interest. It is there, at the conceptual level, where the modeler carries out an integration of the structural aspects (entity types and properties) that hold business-related direct associations. Those conceptual elements are subsequently represented in a logical-level layout by way of the relations (tables) and attributes (columns) mentioned above, and normalization assists in testing the soundness of the logical-level representation.

Denormalization, in turn, is as well a science-based procedure applied at the logical level where, basically speaking, a designer takes a relation (table) that meets a certain normal form and produces a relation (table) that is in the normal form that is immediately “under” the previous one; e.g., a relation (table) that was in third normal form underwent a recomposition to its second normal form, so this procedure is sequential. Denormalizing a relation (table) that complies with first normal form generates an unnormalized one.

Although, in certain contexts, the term “denormalization” is informally (and ufortunately) used to denominate the approach in which a designer decides to put up —in an ad hoc manner— a table that includes, e.g., “summary” columns that retain calculated values and/or columns that belong and are also comprised in other tables, but that is decidedly a course of action which is not associated with the —theory-based— normal forms of the relational model. The task you describe (i.e., mixing in the same table some columns standing for aspects that are not directly associated at the conceptual level) seems to be compatible with this informal (and unfortunate) use of the term “denormalization”, but note that, for the record, I highly suggest not to use it when referring to this sort of approach because it only adds ambiguities and confusions.


Endnote

1 Domain (which is somewhat similar to a type) atomicity has much to do with the way in which an attribute (column) of a relation (table) will be utilized in the concerning database. A nonatomic domain is one that comprises one or more sub-structures that have their own specific significance and require separate constraints and/or will be involved in data manipulation operations accessing value sub-parts enclosed in said sub-structures (e.g., SELECT or UPDATE operations with WHERE clauses incorporating DBMS functions —let us say, SUBSTRING()— to transgress domain atomicity by “touching” value sub-parts).

In his seminal 1970 paper entitled A Relational Model of Data for Large Shared Data Banks, Dr. Codd expounds examples of relations (tables) with domains that accept relations (tables) as values; in other words, the concerning relations (tables) contain “nested” relations (tables) in some of their attributes. In practice, this kind of domain entails columns set up with, let us say, a TABLE data type which, if not handled as atomic, involves evident complexity (recursive “nesting” of columns, constraints, rows and the corresponding operations). To free those relations from said complexity, he presents normalization, the procedure by virtue of which a designer breaks down nonsimple (i.e., nonatomic) domains into atomic ones, obtaining relations (tables) in a more convenient (first normal) form. It is worth to note that, in agreement with the prior paragraph, attributes (columns) defined with domains accepting relations (tables) as values are not the only ones that can be considered nonatomic.

In this way, if each of the columns taking part in the combination process that you mention will always be constrained and used in value manipulation as a single indivisible unit, i.e., never with respect to value sub-parts, then their types will be managed atomically; if not, such a combination process would yield unnormalized tables, i.e., tables that do not comply with first normal form and, hence, not with the further normal forms either, so genuine relational normalization would be called for.