A superkey (and therefore a candidate key) that does not contain all attributes must contain at least an attribute set that is the left side of a functional dependency. Otherwise it would not be possible to generate an additional attribute from the attribute set of this superkey and this contradicts the definition of a superkey.
But a subset of ABCDE containing at least one of C,AB,AE or DE is already a superkey and to be minimal it is necessary that it contains only one of these subsets.
So the only subsets that must me investigated are that that are supersets of AC. But a superset of AC cannot be a minimal superkey because C already is a key.
Therefore the only candidate keys are C,AB,AE and DE
The goal of 3NF (and BCNF) is to avoid situations when a set of attributes is functionally determined by something that is not a key (or, more formally, a superkey), which may lead to anomalies and redundancy.
Keeping that in mind, we can describe certain functional dependencies as either "bad" (if they prevent the relation from being in 3NF) or "good" (if they do not).
So your solution is wrong, because "your" R1 has the E->D
dependency and E is not R1's key. Determining a key for a relation is another topic which, I suppose, is beyond the scope of the task you need to accomplish (we are told that ABC is a key). But if you are interested, you can look the method up using "attribute closure" keywords.
Anyway, the E->D
dependency is "bad". But that is not the only problem. Another one is that your decomposition is not a "lossless-join" one. In other words, if we have some real world data, split the relation according to your proposition and try to join R0 and R1 back, the data will be corrupted. Data corruption is not what normalization is designed for.
We have to solve 2 problems:
- Decide which dependencies are "bad" (the definition of normal forms is the answer to this)
- Decide how to split relations to prevent data corruption (we have Heath's theorem to help us)
Getting back to your problem, finally.
In R, neither B, nor E is the key. So both B->E
and E->D
are "bad", and we need to take measures. On the contrary ABC->EF
is "good", because ABC is a key.
Currently R is in 1NF, because B is a part of a key and functionally determines E, which is not a part of a key. This kind of situation is unacceptable in 2NF.
E->D
dependency is not that "bad", because E is not a part of a key.
Another thing to mention at this point is that a set of dependencies {B->E, E->D}
implies the dependency B->ED
(according to so called Armstrong axiom of transitivity).
So we decide to split using B->ED
dependency as "the worst" one. That's the way we do this, according to Heath:
R = ABCDEF
R0 = BED (B->E, E->D, B is a key)
R1 = ABCF (ABC->F, ABC is a key)
Now we have R1, which is in 3NF and R0, which is in 2NF (it has a transitive dependency B->E->D
, which is unacceptable in 3NF).
We need to split R0 and to do so we have to choose a "victim" dependency among B->E
and E->D
. As we see, B->E
is now a "good" one (B is a key) and E->D
is still "bad".
Our split:
R0 = BED
R2 = BE (B->E, B is a key)
R3 = ED (E->D, E is a key)
R1, R2 and R3 are the final decomposition.
Best Answer
Any relation has multiple keys present in it. The primary key is the minimal key amongst these various candidate keys which we can choose from. Candidate keys are a minimal cover of attributes which determines a tuple. We can have many many candidate keys.
The reason A,B,C,D,E,F,G is a key is since it is not a minimal cover of attributes that determine the entire tuple. In the second example, B is not part of the minimal cover since removing it from the candidate key still makes the remaining attributes a candidate key!