Meta-relationships (Relationships between relationships) versus tuples

database-designmany-to-manyrelations

I am designing a database which needs to store some fairly complex relationships among many different types of entities.

In the simplest example, let us say I have three types of entities – "Student", "Tutor", and "Course", each with their own tables (student, tutor, course). First of all, I need to represent simple many-to-many relationships between each possible pair of entity types:

  • student_course – Each student can be enrolled in zero or more courses, and each course can have zero or more students. (ex: "Alex is enrolled in Biology").
  • student_tutor – Each student can be working with zero or more tutors, and each tutor can be assigned to zero or more students. (ex: "Emma is assigned to Alex").
  • course_tutor – Each tutor can be approved to teach zero or more courses, and each course can have zero or more approved tutors. (ex: "Emma knows how to tutor Physics").

Easy enough. However, I also need to represent some more advanced relationships, like:

  • "Tutor John is working with students Alex and Maria on course Chemistry"
  • "Tutor John is working with student Alex on course Algebra"

Notice that these are not captured by the binary many-to-many relationships I described earlier – John can be tutoring Maria, and Maria can be enrolled in History, but that doesn't mean that John is tutoring Maria in History.

The way I see it, I have two options:

Option 1 – A single table containing triplets

Table student_course_tutor:

| student_id | course_id | tutor_id |
|------------|-----------|----------|
| 1          | 1         | 1        |
| 1          | 1         | 2        |
| 1          | 2         | 5        |

This is nice because it explicitly lays out the full relationship in a single row. However, I now need to maintain this table in addition to my simple relationship tables.

Option 2 – Meta-relationships

Table student_course:

| id | student_id | course_id |
|----|------------|-----------|
| 1  | 1          | 1         |
| 2  | 1          | 2         |

Table student_course__tutor:

| student_course_id | tutor_id  |
|-------------------|-----------|
| 1                 | 1         |
| 1                 | 2         |
| 2                 | 5         |

Since I was going to have a student_course table anyway, adding this "meta-relationship" table is less redundant than representing the full triplet. However, it now becomes a lot more abstract and difficult to understand.

Which way is more correct?

Best Answer

The two examples are not equivalent and interchangable implementations; they embody different semantics.

In the first, the three-way table imposes no conditions on the participants. The example shows this. Alex is enrolled in Biology, Emma is assigned to Alex and Emma knows how to tutor Physics. The tutor's subject is Physics but has a student who's enrolled in Biology and there's nothing in the model to stop them discussing, say, Economics.

In the second, the tutor_id can only be associated with a pre-existing student_course_id. So Alex has to be enrolled in Biology before Emma can become his tutor. Since there's only one course_id (through the associated student_course) we can assume this is what they will discuss in their tutorials. However, there is still no way of asserting that Emma is part of Alex's faculty.

If you're still working out the logical data model I would suggest you skip the surrogate keys and use only the natural keys for now i.e. drop the various "id" columns and only use course.name, student.name and tutor.name. Surrogate keys are great performance enhancers in real DBMS implementation but are not required when understanding and documenting the problem. They can be substituted in later when you're confident you have solved your problems.

Next you need to understand the constraints on the data and the questions the DB has to answer. For example a constraint may be that a tutor can only work in the subject which she is employed. A question may be "who's available to tutor a Chemistry student?" Once you have these, the tables to enforce and answer them will emerge, as will the foreign key constraints.

You may be content with any tutor teaching any subject to any student. That's OK if it is so, I can't tell form the question. Or you may choose to enforce this outside of the database - in the application, say, or through a written policy enforced by management. The important thing is to understand the rules as they are, and how they're implemented, so that when they change the appropriate adjustments can be made.

Do not be afraid of having lots of tables with a few rows if this is what the data demands. Would you bolt on a new clickstream half way through an existing web site just because you "were going to have it anyway"?