Relational tables should contain data and not logic – correct

database-designdesign-pattern

Ok, I couldn't find a better title, to be honest.

I'm recently landed in a new working place, and I've been asked to help with the DB. I wanted to try to explain a concept, but I'm not sure how to word it at best, and maybe this concept has already a better explanation and a name, too. I don't label myself as a DB expert, so I'd like to present them something more official than my own point of view, no matter how many good DBs I've designed in my life.

One of the things I've found mostly amusing (but not in a good way) is that they seems to follow design the db according to the logic of the problem and not according to "this is a database, it should contain data", so for example to know if an element is in a certain category, the query goes this way (in pseudocode):

SELECT Column1, Column2...ColumnN 
FROM Invoices [LIST OF JOIN]
WHERE
Invoces.Field1 IN 
    (TableA WHERE TableA.Field NOT IN 
        (TableB WHERE TableB.Field IN ((TableC WHERE) OR (TableD WHERE)))
OR Invoces.Field1 NOT IN 
    (TableB WHERE TableB.Field IN ((TableC WHERE) AND (TableD WHERE))
OR TableB.Field IN 
    (TableC WHERE TableC.Field IN (TableC WHERE))
OR TableB.Field NOT IN 
    (TableC WHERE TableC.Field NOT IN (TableD WHERE))

and so on and on for other twenty conditions or so. No, I'm not joking.

This happens because instead of saying "this is an invoice and its category is an attribute of the invoice itself", they go all the way with things like "the invoice is been inserted by that user, in that day, and he could have inserted it into a certain department or none at all, and in case of the department then that department could be part of a certain structure…" and so on. It's not even a matter of referential integrity, there are no FKs at all in this case…

Another practical example is the date of last edit of a document: instead of being stored as an attribute, it's calculated by a function which goes something along this way:

if that document is related to another, then if the other is this
kind, then if the first document is related to this other document
too, then the date of last edit is the date when the administrator has
printed the daily appointment calendar"…

clearly with a bunch of else for every if, too. AND, it's not because there is some sort of dependencies, it's just because many things are stored to reflect their logic.

In a way I'd be tempted to say "write once, read many", with the meaning of "write that attribute once and then read it for free as many times as you need", because we have a low frequency of writes and a very high frequency of reads, and clearly all reads must every time recalculate everything. But I've got a feeling there must be some principle that says that you should design databases to hold data…

Best Answer

It Depends

The purpose of a database is to store valid data in a secure fashion for multiple users across multiple applications.

Some of the logic will focus on the valid data portion of that statement. These will be your constraints and data types.

Other parts of code within the database will focus on the multiple applications portion. For example, a VIEW will hide some complex logic to ensure that all applications see the data in an identical manner. One application could be a web based application, the other application could be a 3rd party report generator.

OLTP

Within an OLTP system, the complex SELECT statement and the last modified date logic doesn't make much sense.

If this is suppose to be an OLTP system, you should try to change things so that they follow normal relational database practices.

Data Warehouse

The complexity of your pseudo-code SELECT statement and your date of last edit logic makes perfect sense in a Data Warehouse [DW] environment. However, they are probably part of an ETL/ELT process for materializing the data into a denormalized table for faster report generation.

The TABLES and VIEWS and MATERIALIZED VIEWS are built to solve a very specific set of business problems. From my experience, those objects are used as the source for various Business Reports. In some cases, one table/view per report is created to solve a Business Requirement.

No FKs? That's probably normal for a DW. The data in the tables could be refreshed with TRUNCATE followed by INSERT...SELECT. In this situation, a Foreign Key could do more harm than good.

As I read the comments, it seams your company needs to move to a more formal OLTP hierarchical design. I wish you luck.

Related Solutions

Normalization: Should common columns of two tables be pulled out into a third table

I see 3 options

Do nothing
Superkey/subtype, that is put the common fields into empl
Change to a self referencing table to build a hierarchy

Now,

Option 2 is similar to what you've described (mostly)
Option 3 is where you'd use a recursive CTE

However, I can't help thinking that Option 3 or some variety would be better. Perhaps some more info about what the columns mean? (Personally I don't abbreviate column names because the names are opaque)

Designing a Products database: One Products table or many separated by vendor

I would go with number 1. Having a separate table for each vendor or each product type will be a nightmare when trying to run reports that need data from multiple product tables. You don't want to have to create a new table each time you add a vendor or product type. I understand your inclination coming from a spreadsheet, but with the ability to query the records easily, separating the products into multiple tables is not the way to go.

I am inclined to say that there is a downside because products of one category will have many NULLs in attributes meant for other categories, but I'm not sure if this is actually a negative.

To prevent this, attributes should be in a separate table.

For example, a partial table definition may be as follows:

products
-- id
-- name

product_attributes 
--product_id
--name
--value

attributes.product_id is a foreign key of products.id.

To further normalize it you could define the attributes in a separate table and have an attribute_id field instead of attribute_name:

    attributes
    --id
    --name

    product_attributes 
    --attribute_id (foreign key of attributes.id)
    --value

(edited for spelling)