Database Normalization – Identifying Functional Dependencies

dependenciesnormalization

I'm having trouble identifying the combination of attributes which make up Functional Dependencies(FDs) of a particular table. While I understand how Normalization process works when given the FDs, I'm unable to translate and identify FDs for a given table.

I've been tasked to create a database of a company which tracks all expenses of sales/packages etc. Naturally that means storing Customer information for the company to identify all consumers.

CREATE TABLE Customers (
    id        integer primary key,
    address   text not null,
    name      text not null,
    email     text unique not null,
    phone     text unique not null,
    unique(name, address)
);
    

I've chosen to represent the info for the customer table as such, given that each phone number and email is uniquely tied to the customer. The unique constraint allows for multiple people from the same household address to be present in the table.

I want to see if I can normalize this table further, however I am unable to identify the attributes to determine the FDs needed.

Best Answer

A set of attributes Y is functionally dependent from another set X if and only if Y is uniquely determined by X, that is you cannot have two different values of Y associated to the same value of X. So, for instance, assuming that a phone number is tied always to only one person, and that a person has always only one address, you can say that the functional dependency phone -> address holds, since, given a certain phone number, only the address of the person with that number can appear together with it.

So a functional dependency specifies in a formal way an important fact of the reality that is modeled in your database.

A problem with functional dependencies is that you can have a lot of them, even for small relations. A clever way of solving this problem is to define only a small set of dependencies, sufficient to “capture” all the essential information about a certain reality, since we can derive all the other dependencies from this set in a mechanical way (and there are programs that can do this, for instance by applying the well-known “Armstrong’s axioms”). However, sometimes it is not so easy to define such “essential” set of functional dependencies and we must look carefully at the specification of the problem that we are modeling.

Usually we start from the so-called “candidate keys”, that is the attributes or the sets of attributes that are “naturally” unique inside a relation. Since they are unique, we know that they determine all the other attributes. In your example, you have already identified four candidate keys, id (the primary key), (name, address), phone and email (declared unique), so we can already say that:

id -> name, address, email, phone
name, address -> id
phone -> id
email -> id

(note that is enough to say that an attribute determines a candidate key, because from this fact we can derive that it determines all the other attributes, that is that it uniquely determines a person).

Then we can see if some other attribute of combination of attributes uniquely determines either other attribute(s), or a candidate key, (which is equivalent to determines all the other attributes).

There are other (non-trivial, interesting) functional dependencies in your relation? Probably no, given the obvious semantics of the data (for instance two different persons can have the same address, so an address does not “uniquely determines” anything, or two different persons can have the same name, etc.). So we could have a certain degree of confidence that the four FDs are an “essential” set of FDs (technically this is called a “cover” of the set of FDs of the relation).

And so from this cover we can start to normalize the relation (that is already normalized, in this case).