Database Design – Is Combining Independent Tables Called Normalization?

database-designdenormalizationnormalizationrelational-theoryterminology

So, at workplace, we currently store our audit logs in different tables, depending on what it is e.g. login/access info, configuration changes, etc.

These are all independent data with no foreign key or any relationship whatsoever. Some of the columns are similar, e.g. ID (obviously), datetime, name of user making the change, etc, while others are different.

Recently, I'm asked to combine all the tables into one, where the similar columns will be preserved, while the different columns will be in JSON, stored in a CLOB column.

I'm being told that this process is called "normalization" – we're converting the many tables into "normal forms".

Now, I'm not a database expert, but that doesn't seem to match what I've learned about normalization in my Intro to Database class. Or am I just ignorant?

Best Answer

No, the task you are being asked to fulfill —arranging (a) columns that belong in two or more distinct base tables in (b) a single base table— is not called normalisation (nor normalization for those who write it so).

Regarding the relational model, crated by Dr. E. F. Codd, normalization is a science-based procedure performed by a designer at the logical level of abstraction of a database which involves:

decomposing, by means of first normal form, a relation (table) with one or more attributes (columns) configured with nonatomic¹ domains (types) so that, having all the relevant domains (types) fixed as simple or atomic, data manipulation and constriction are much easier to declare via the expressive facilities of the data sublanguage (e.g., SQL) provided by the database management system (DBMS for brevity) of use; and
getting rid of undesirable dependencies among the attributes (columns) of a relation (table), by virtue of split-ups based on the successive normal forms, in order to avoid update anomalies.

Of course, the designer has to take into account the meaning carried by the relations (tables) and attributes (columns) under consideration. The aforementioned meaning is assigned in terms of a modeling exercise that takes place at the conceptual level of abstraction, depending on the particular informational characteristics and rules of the business environment of interest. It is there, at the conceptual level, where the modeler carries out an integration of the structural aspects (entity types and properties) that hold business-related direct associations. Those conceptual elements are subsequently represented in a logical-level layout by way of the relations (tables) and attributes (columns) mentioned above, and normalization assists in testing the soundness of the logical-level representation.

Denormalization, in turn, is as well a science-based procedure applied at the logical level where, basically speaking, a designer takes a relation (table) that meets a certain normal form and produces a relation (table) that is in the normal form that is immediately “under” the previous one; e.g., a relation (table) that was in third normal form underwent a recomposition to its second normal form, so this procedure is sequential. Denormalizing a relation (table) that complies with first normal form generates an unnormalized one.

Although, in certain contexts, the term “denormalization” is informally (and ufortunately) used to denominate the approach in which a designer decides to put up —in an ad hoc manner— a table that includes, e.g., “summary” columns that retain calculated values and/or columns that belong and are also comprised in other tables, but that is decidedly a course of action which is not associated with the —theory-based— normal forms of the relational model. The task you describe (i.e., mixing in the same table some columns standing for aspects that are not directly associated at the conceptual level) seems to be compatible with this informal (and unfortunate) use of the term “denormalization”, but note that, for the record, I highly suggest not to use it when referring to this sort of approach because it only adds ambiguities and confusions.

Endnote

_{¹ Domain (which is somewhat similar to a type) atomicity has much to do with the way in which an attribute (column) of a relation (table) will be utilized in the concerning database. A nonatomic domain is one that comprises one or more sub-structures that have their own specific significance and require separate constraints and/or will be involved in data manipulation operations accessing value sub-parts enclosed in said sub-structures (e.g., SELECT or UPDATE operations with WHERE clauses incorporating DBMS functions —let us say, SUBSTRING()— to transgress domain atomicity by “touching” value sub-parts).}

_{In his seminal 1970 paper entitled A Relational Model of Data for Large Shared Data Banks, Dr. Codd expounds examples of relations (tables) with domains that accept relations (tables) as values; in other words, the concerning relations (tables) contain “nested” relations (tables) in some of their attributes. In practice, this kind of domain entails columns set up with, let us say, a TABLE data type which, if not handled as atomic, involves evident complexity (recursive “nesting” of columns, constraints, rows and the corresponding operations). To free those relations from said complexity, he presents normalization, the procedure by virtue of which a designer breaks down nonsimple (i.e., nonatomic) domains into atomic ones, obtaining relations (tables) in a more convenient (first normal) form. It is worth to note that, in agreement with the prior paragraph, attributes (columns) defined with domains accepting relations (tables) as values are not the only ones that can be considered nonatomic.}

_{In this way, if each of the columns taking part in the combination process that you mention will always be constrained and used in value manipulation as a single indivisible unit, i.e., never with respect to value sub-parts, then their types will be managed atomically; if not, such a combination process would yield unnormalized tables, i.e., tables that do not comply with first normal form and, hence, not with the further normal forms either, so genuine relational normalization would be called for.}

Related Solutions

PostgreSQL – Combining Multiple ‘Type’ Tables When Normalizing

I would go for separate tables for each type. With so few rows performance will not be a consideration either way. Individual tables will only have a page or two each. Similarly for the combined table.

Either way you will have to join a "type" table to complete your views. Inserts are infrequent at this level of abstraction so locking is unlikely to be an issue either way.

For me, though, having different tables keeps logically distinct items separated, which is a cleaner design.

There will be many different tables which will have a relationship to a type table of some sort. If there are many different type tables the foreign key relationships are self-documenting and enforcable through DRI. With one combined table it is difficult to use DRI to say "Venue.TypeID comes from Type.TypeID but only if the Type.Category = 'V'", for example. The complementary side of this relationship - "Type.TypeID with Type.Category = 'V' can only be used in the Venue table" - is also not supported in any RDBMS's DRI that I know of.

Should I create separate table or have the information as columns in the same table

If you really want to normalize your design, here is a suggestion to do so. You should be able to create foreign key constraints. This is without the extra Instruction table since all documents will have the instruction details, stored in the Document table:

Executor
----------
ExecutorID


Document
----------
DocumentID
ExecutorID
Deadline
Target
-- other columns, relative to all documents


DocumentA
----------
DocumentID
-- other columns, relative only to documents of type A


DocumentB  
----------
DocumentID 
-- other columns, relative only to documents of type B

and their relationships:

              Executor
                /1
               /
              /
             /0..N 
         Document
         1/    \1
         /      \
        /        \
   0..1/          \0..1
 DocumentA      DocumentB

*Note: the only constraint not shown (and enforced) with the above design is that a document has to be of either A or B type (and not both.) This can be enforced as well, with a somewhat more complex design.

Best Answer

Endnote

Related Solutions

PostgreSQL – Combining Multiple ‘Type’ Tables When Normalizing

Should I create separate table or have the information as columns in the same table

Related Question