Postgresql – A question about foreign keys and many to many relationships

database-designforeign keymany-to-manypolymorphic-associationspostgresql

My question is mostly conceptual in nature. I have a database schema that involves a many-to-many relationship between a School table and Student table. This seems sensible to me, given that a school can have many students, and a student can attend many schools. I'm hung up on the problem of whether to include the school as a foreign key in the tables that are directly beneath the Student table in the information hierarchy (Behavior, Attendance, Grade).

In short, the many-to-many relationship between School and Student allows for the possibility that a student referenced in the Attendance table would have multiple schools listed in the linked Student table. This could result in unwanted ambiguity when going over attendance records. (i.e. Did they miss class in school A or school B?) My solution is obvious: include a School foreign key for any tables beneath Student to make their relationship to a given school explicit.

Ugly data flow

The problem: this seems inelegant. Aesthete that I am, I would like the information to flow through tables, instead of jumping tables. A convenient solution I've encountered would be to rely on a polymorphic association, wherein the primary key passed to the tables beneath Student (Behavior, Attendance, Grade) provides unique values for both the student and school. Unfortunately, it appears that SQL won't tolerate that approach.

Is there a solution that allows me to enjoy the apparently sensible data flow offered by polymorphic association (illustrated below) without breaking the rules of SQL, or am I trying to have my cake and eat it to?
preferred (pretty) data flow

Best Answer

Given that a school can have many students, and a student can attend many schools

.. you should have a many-to-many relationship between school and student - implemented as separate table. (school_id is misplaced in the table student.) I'll simply name it school_student:

CREATE TABLE school_student (
   school_student_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY  
 , school_id  int NOT NULL REFERENCES school
 , student_id int NOT NULL REFERENCES student
 , ...
);

And then use school_student_id in tables behavior and attendance.
Alternatively, you can use a multicolumn PK on (school_id, student_id) and carry over (school_id, student_id) as FK - effectively implementing what you had in mind - with standard SQL. Like:

CREATE TABLE school_student (
 , school_id  int NOT NULL REFERENCES school
 , student_id int NOT NULL REFERENCES student
 , ...
 , PRIMARY KEY (school_id, student_id)
);

You don't need that for grade, since that references a course bound to a school already. But you might implement something to enforce integrity so that students can only attend courses at schools they are enrolled at. If you use (school_id, student_id) as FK, that can be enforced automatically - using a single copy of school_id for both FKs in table grade.

Related: