Are updates the only disadvantage of denormalization

database-designdenormalization

I am trying to understand what are the disadvantages of denormalization. Say that I have the following database (School_has_Student is a denormalized table):

enter image description here

I have read that the problems that you may face when using denormalization are related to the operations of INSERT, UPDATE, and DELETE.

Now I understand why we have the UPDATE problem, because if I updated one piece of data, then I would have to update the rest of the related pieces of data (for example: if I updated the student name "Paul" in the Student table, then I would have to also update the student name "Paul" that exist two times in the School_has_Student table), but I don't understand why INSERT and DELETE are also problems.

Note: I know that the increase of storage space is also a disadvantage, but I do not think it is a big disadvantage with current storage devices which have very large capacity.

Best Answer

There are different kinds of denormalization. Say we can have the main table readouts like that:

CREATE TABLE `readouts` (
  `sensorID` INT(10) UNSIGNED NOT NULL,
  `timestamp` TIMESTAMP NOT NULL,
  `value` FLOAT(9,5) NOT NULL,
  PRIMARY KEY (`sensorID`, `timestamp`)
);
+----------+-----------+-------+
| sensorID | timestamp | value |
+----------+-----------+-------+

Table stores the readouts from the lot of sensors. If we need the last readout for each sensor we have to run the query like that:

SELECT a.*
  FROM readouts AS a
  JOIN ( SELECT sensorID
              , MAX(timestamp) AS timestamp
          GROUP BY sensorID
       ) AS b  ON b.timestamp = a.timestamp
              AND b.sensorID  = a.sensorID
 ORDER BY sensorID ASC
;

That is quite heavy query even when the table readouts is properly indexed. We can denormalize the database by creating the table of the same structure but the UNIQUE constraint for sensorID:

CREATE TABLE `last_readouts` (
  `sensorID` INT(10) UNSIGNED NOT NULL,
  `timestamp` TIMESTAMP NOT NULL,
  `value` FLOAT(9,5) NOT NULL,
  PRIMARY KEY (`sensorID`)
);

Now we have to INSERT incoming readouts to the both tables. The second table should be inserted in the special way:

INSERT INTO last_readouts 
     VALUES (sID, ts, val) 
     ON DUPLICATE KEY UPDATE 
        `timestamp` = ts
         value = val
;

This addtitional INSERT is the insert-related disadvantage of the denormalization. But the advantage is the minimal cost of the query

SELECT * FROM last_readouts;

The DELETE disadvantage is of the same sort - you need more operations of DELETE but you have some benefits instead.