Is it always bad to have an own table for value objects

database-designembedded

I know that value objects should be embedded in the table of the object they belong to but what if the value object is more than a simple object and contains many fields/columns or even sub-objects?

I currently have a table named content which in short stores rich text. Since I don't always parse the content I've added a caching system. The table also has a 1:n-relation to another table which contains a list of all the links in the content table.

content(content_id[PK], text)
content_link(link_id[PK], content_id[FK], object_id)

If I would put this data in the orginal table I would follow the rules of database design but many it would make work harder.

Best Answer

Normalization is a set of techniques to avoid certain problems viz data update anomolies. In a perfect world you would not store structured content but split it into its constituent parts across however many table are required and re-construct each complex object at runtime (1st normal form).

However .. if pre-parsing the object and storing the interesting / frequently used bits somewhere else resolves a performance issue then do this as long as you do it with your eyes open. If you, and everyone else who may code against the database, understands that extra work must be done in the application in order to achieve the necessary response times then that's a design choice and necessary in the real world.


Response to comment:

My reading of your question is that content.text contains structured text in which links are embedded. Your application shreds content.text from time to time to find and use these links. This is expensive and as a performance optimisation you would like to store the links separately in content_link.link_id.

This is where the update anomalies can creep in. If content is updated so that content.text has a different set of links, but content_link is not also updated then there is inconsistent data in the database. It works the other way, too - a row can be deleted from content_link without content.text changing and, again, the data is inconsistent. This is why the application has to be diligently coded and tested.

There is another rare-but-possible case to consider, too. Changing the data will require two statements - update content and update content_link. RDBM systems use pre-emptive multitasking. This means that at any time the RDBMS may choose to halt one workload and run another instead. So it could, for example, halt the updating stream after the first update statement and before the second. If there is another workload which is trying to read content or content_link at that point it may run, and get inconsistent values, if the reading transaction's isolation level allows it to do so. As I said, rare but possible.

If these problems are acceptably unlikely in your application and the implications of it happening are not severe enough to worry about then denormalisation can serve performance advantages. If the risk of data inconsistency is too great then you'll have to find other solutions.