Functional Dependency on Child Table in Third Normal Form (3NF)

database-designnormalization

Consider two simple tables, OrderHeader and OrderDetail which contain, respectively, the data shown below:

OrderHeader

| OrderHeaderID | CustomerID | OrderTotal |
|---------------|------------|------------|
| 1             | 124        | 14.5       | 
| 2             | 525        | 35.6       |
| etc           |            |            |

OrderDetail

| OrderDetailID | OrderHeaderID | ProductID |
|---------------|---------------|-----------|
| 1             | 1             | 415       |
| 2             | 1             | 52        |
| etc           |               |           |

The OrderTotal column can be calculated by adding the price of each product from the OrderDetail table. That sounds like a functional dependency! ie:

SUM(OrderDetail.Product.Price) → OrderHeader.OrderTotal

…but afaik it's not a violation of 3NF.

Is this actually allowed? If yes, is it disallowed in higher normal forms?

Best Answer

There is an informal dependency, but it's not a functional dependency. See the wiki article for the formal definition. Here's an excerpt in simple terms:

In simple words, if the values for the X attributes are known (say they are x), then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x.

There is no normal form definition which precludes summarizing child record values in the parent record. However, lots of people would suggest against doing this in transactional databases, particularly where there's a possibility that a child record could be updated (or deleted, or more child records added) such that there's a risk of the attribute in the parent becoming inconsistent with the aggregate value of the children.

If your OrderHeader table had a CustomerName column, then that would violate 3NF because of the transitive functional dependency on CustomerID.

Being informal, a lot of people would say that OrderTotal is denormalized because it's susceptible to things that look like insert, update and delete anomalies and so people draw a comparison with the formal definitions of normalization and normal forms (since these are also about avoiding insert/update/delete anomalies). There are some sticklers, though, who will be adamant that avoiding summary attributes in parent records is not the same thing as normalization per se.

Related Solutions

Is this relation in third normal form (3NF)

In the end whether the table is 1NF and 3NF depends on what the domain value of a city is. This answers whether or not you have atomicity.

In essence what normalization means in the abstract regarding address information is quite controversial and it is not an easy area to answer and so I personally believe that this must be addressed on a case by case basis, based on the semantic use of the data.

Otherwise consider the following addresses:

418 N Bradley St
Chelan, WA 98816

How many atomic fields would this require with maximum decomposition? So we consider this to be:

CREATE TABLE address (
    id int not null unique,
    street_segment char(1),
    street_name varchar not null,
    city_name varchar,
    state_province varchar,
    mail_code
);

That seems straight forward enough until you get an address that requires a cross-street as is common in South America, then we start to add fields. But now you have to track an address in Managua like:

Bo Altagracia km 3 1/2 carretera sur | Montoya 3c al Oe

Note that this is an ordinal address based on landmarks, distance and direction, and it does not fit into this data model at all and a normalized representation would require at least one additional table and some additional fields, and we can no longer enforce the same not null constraints.

But then this doesn't necessarily mean there is anything breaking 1NF if we just store the address as a text string up through and including the country.

If you have no reason to track countries, then one may see "London, England" as a valid representation of a city designation and then yes, we could see this as 1NF. In essence such a table has a city domain (where "London, England" is distinct from "London, UK" and "Paris, France" is distinct from "Paris, North Carolina"). This gives you a number of possible issues, but it does not pose classic normalization problems unless you need to track countries (here the country is just an incidental part of the city's domain value, not a domain in itself).

So in this case it may or may not be 1NF depending on how you define the domain of the address to be and what you have to track it relative to. If it is 1NF then it is also 3NF.

However if you break city and country into separate fields, then you have a normalization problem because city is dependent on country, and so to be truly 1NF you'd probably need to break out your regional hierarchies quite a bit more (a cities table, a state/province table, a country table).

A typical approach to this is exactly what you have done, which is to treat the address as a text string atomic domain, and not break things out any further.

Edit: Another example of 1NF problems using arrays

Suppose on PostgreSQL I am storing IP addresses and will be querying on octets. I might represent an IP address as a smallint[] array like: array[192,168,1,101,24] instead of a cidr representation like '192.168.1.101/24' and this does not break 1NF. Each array of smallints is distinct in its domain and each represents a single value of its domain. This does not break 1NF because in the domain of an IP address, each array represents a single value in its domain (and this is ensured by the fact that ordinality is important). This is a good example of why it is wrong to assume, for example, that the inclusion of complex data structures or arrays necessarily violates 1NF.

Finally if "this datatype can be decomposed and therefore it isn't atomic" breaks 1NF then so does every use of a datetime datatype....

TL;DR

1NF's atomicity requirement is violated only when the column stores two or more values within a given domain. That is not the case in the initial question so 1NF is not violated given the information shown. Given that 1NF is not violated and both name and address are functions of customerid, and not functions of eachother, the requirements of 3NF are met.

Is this table in third normal form

Here's one attempt (this is not a day to day activity for me, so I may be doing some weird errors below):

A and B are clearly candidate keys of R. Therefore C is the only non-prime attribute of R

R is in 3NF iff:

a) R is in 2NF

b) Every non-prime attribute of R (C) is non-transitively dependent on every superkey of R.

The superkeys of R is (A, B), (A, C), (A, B, C)

By reflexivity (A, C) -> C AND (A, B, C) -> C By composition of A->C and B->C we get (A, B) -> (C, C), i.e. (A, B) -> C

C is therefore non-transitively dependent on every superkey of R, and hence R is in 3NF.