Normalizing the Test Table

database-designdenormalizationnormalization

I'm a web developer looking to leverage more of the power of my RDBMS. I'm currently learning about normalization of data and database design.

In the book that I am reading, we started off with one table that contained information about students, teachers, tests, grades, etc, and broke that down into separate tables for each entity.

However, I'm looking at the test table (see picture below), and I am wondering if this table also violates a few design principles.

Test Table

In particular, I think that the Test column violates the "no redundant data" principle, and the "no partial dependencies principle". (On this second point, I am assuming that the primary key for this table is a composite key of TestID and TeacherID, in which case the Test column depends only on the TestID part of the primary key).

What I would do in this case is further break up the test table into a Tests table containing the a TestID as a primary key, Name as the name of the test, and TotalPoints. Then, I would create a AdministeredTests table that contains a TestId, a TeacherId, and a Date, and the primary key would be a composite of TestId and TeacherId.

Would I be on the correct track here? I'm asking because the author also shows the following table:

Formats

Again, this suffers from redundant data (although it has no partial or transitive dependencies). Based on my current understanding, I think the author simply has not introduced join tables yet, hence the reason why normalization "stops" at a certain point. Also, the author is quick to point out: "…we presented one possible database design as an illustration of how to normalize data. In truth, many possibilities exist for designing this database." Just wondering the possibility I presented here is reasonable.

Best Answer

The problem here is the approach. Looking at table values and guessing keys is not a way to go.

Simply write down the predicate and constraints for the table, everything else follows from this.

Here is one example. If your business rules (predicate and constraints) are different then keys are going to be different too, so table may or may not have to be decomposed.

Note:

[Px]   = predicate x
[cx.y] = constraint x.y

KEY = PK or AK

PK  = PRIMARY KEY
AKn = ALTERNATE KEY (UNIQUE)
FKn = FOREIGN KEY

[P1] Test number (TEST_ID) of test type (TEST) with total score of (TOTAL_POINTS) was administered on date (DATE) by teacher (TEACHER_ID).

[c1.1] Test is identified by test number.

[c1.2] Each test is of exactly one test type; for each test type it is possible that more than one test is of that test type.

[c1.3] Each test has exactly one total score, for each total score it is possible that more than one test has that total score.

[c1.4] Each test was administered on exactly one date; for each date it is possible that more than one test was administered on that date.

[c1.5] Each test was administered by exactly one teacher; for each teacher it is possible that more than one test was administered by that teacher.

test {TEST_ID, TEST, TOTAL_POINTS, DATE, TEACHER_ID}
 KEY {TEST_ID}

As far as the TEST column is concerned, it simply contains valid values from the domain of this attribute. You may for example add a predicate.

[P2] Test type (TEST) exists.

test_type {TEST}
      KEY {TEST}

and populate with valid domain values

test_type (TEST)
{
  ('Pronoun Quiz')
, ('Solids Quiz')
, ('China Test')
, ('Grammar Test')
}

Use this to constrain values in the test table, so it would look like this:

test {TEST_ID, TEST, TOTAL_POINTS, DATE, TEACHER_ID}
  PK {TEST_ID}
 FK1 {TEST} REFERENCES test_type {TEST}

When considering physical design, you may want to use an integer for this foreign key -- so this may look like:

test_type {TEST_TYP_ID, TEST}
       PK {TEST_TYP_ID}
       AK {TEST}


test {TEST_ID, TEST_TYP_ID, TOTAL_POINTS, DATE, TEACHER_ID}
  PK {TEST_ID}
 FK1 {TEST} REFERENCES test_type {TEST_TYP_ID}

And so on; however, without explicitly defining business rules -- in form of predicates and constraints -- everything is open to misinterpretation and endless arguing.