MySQL Database Design – Constraining Inserts/Updates for Summary Tables with Complex Validation Logic

database-designmany-to-manyMySQLunique-constraint

I am designing some MySQL tables and am running into an issue that I believe can be solved by a uniqueness constraint, but not sure exactly how. My actual use case is complicated and confusing, so I've simplified it down to a meaningful analogy involving schools, classrooms and test grades (but note: this is not homework – I'm a developer trying to solve a real-world problem here).

Some domain entities and business logic for a backstory:

  • There are Students
  • Students take Classes
  • In the Classes you get Tests which are graded
  • Each Quarter students have to sign up for a particular CourseLoad, which is a pre-determined sets of classes to take. A CourseLoad is a particular set of Classes that are all taken at the same time (e.g. the Core Math CourseLoad might consist of: MATH 205, MATH 210, and MATH 326, etc.)

Here are my tables as they currently stand, and although I'm 100% open to refactoring them, I am pretty happy with the given design (also please note: please don't linger on naming conventions here, its the design which is most important!):

students
========
student_id : INTEGER PRIMARY KEY AUTO INCREMENT (hereafter 'IPKAI')
student_name : VARCHAR(100)
student_dob : DATETIME

classes
=======
class_id : IPKAI
class_name : VARCHAR(50)    // Ex: 'CHEM 101'

courseloads
===========
courseload_id : IPKAI
courseload_name : VARCHAR(50)   // Ex: 'Core Math Courseload', 'General Ed Courseload', etc.

// Classes and Courseloads are many-to-many: The same Class can appear inside multiple
// Courseloads, and the same Courseload can consist of many Classes
classes_to_courseloads
======================
classes_to_courseload_id : IPKAI
class_id : FK INT (classes)
courseload_id : FK INT (courseloads)

quarters
========
quarter_id : IPKAI
student_id : FK INT (students)
courseload_id : FK INT (courseloads)

tests
=====
test_id : IPKAI
student_id : FK INT (students)
quarter_id : FK INT (quarters)
class_id : FK INT (classes)
test_date : DATETIME
test_grade : INT

So finally, my problem:

A Test occurs inside a given Quarter (reporting period). A Test also belongs to a Class, which itself belongs to 1+ CourseLoads. A given Quarter also has a CourseLoad assigned to it (since the student must select which CourseLoad they are going to tackle that particular quarter).

I'm worried about validation here. Let's take a living, breathing example to make this understandable:

  • Given a Quarter with a quarter_id=14
  • A Student with student_id=508 signs up to take a 'Core Math CourseLoad' (courseload_id=5)
  • The Core Math CourseLoad consists of 3 Classes:
    • MATH 205 (class_id=8); and
    • MATH 210 (class_id=18); and
    • MATH 326 (class_id=92)
  • At some point during that quarter, the student takes a mid-term for MATH 210, which is test_id=38321 and the student receives a grade of 94

Given all this info, the tests record for this mid-term would look like:

test_id     student_id    quarter_id      class_id    test_date   test_grade
============================================================================
38321       508           14              18          2015-07-23  94

What I need to validate is: That it becomes impossible for the student to have a test record during a given quarter for a class that is not apart of that quarter's courseload. In other words:

test_id     student_id    quarter_id      class_id    test_date   test_grade
============================================================================
38326       508           14              21          2015-08-19  85

…should be an invalid entry and should violate some constraint or raise an error if we try to insert that as a new tests record. Because, class_id=21 is not a part of the Core Math CourseLoad.

I am looking for an elegant solution here. Again, I think this can be solved by introducing a UK to the tests table, but would consider any solution (even heavy table refactoring) if it means solving the problem correctly.

Best Answer

That sort of validation is not a candidate for a column or table level Constraint. You have two options depending on how you're interacting with this database:

  1. Validation is done via the 'front end' interface, it won't let you make 'bad' picks. You can then validate them again within a SQL query to double check if so desired.

  2. Validation only occurs within a SQL query, allowing the 'front end' to pass along potentially 'bad' data.

If you choose to validate in the SQL query you can either build explicit checks using typical IF THEN iterative logic or build it into your INSERT queries. To do the latter your tables need to contain the data in a relationship that mimics you validation scheme, which luckily it looks like your tables do. Something like the following should do what you're looking for (@ denotes variables, I'm assuming your passing those in or setting them ahead of time):

INSERT INTO Tests (student_id, quarter_id, class_id, test_date, test_grade)
SELECT student_id
       ,quarter_id
       ,class_id
       ,test_date
       ,test_grade
FROM (SELECT @student_id AS student_id
             ,@quarter_id AS quarter_id
             ,@class_id AS class_id
             ,@test_date AS test_date
             ,@test_grade AS test_grade ) AS InsertSet
        INNER JOIN quarters Q
            ON InsertSet.quarter_id = Q.quarter_ID
               AND InsertSet.student_ID = Q.student_ID
        INNER JOIN classes_to_courseloads CTC
            ON Q.courseload_ID = CTC.coarseload_ID
               AND InsertSet.class_id = CTC.class_id    

By turning the variable set into an inline table we can join against the other tables to insure the data is valid. We INNER JOIN to the Quarters table to insure the quarter_ID and student_ID match up, thus verifying he is enrolled for the quarter. We then INNER JOIN onto Classes_to_Courseloads to verify that the class_ID the test grade is for matches up with his courseLoad_ID the student_ID was signed up for in the Quarters table. If any of the JOINs in that query fail you'll get no results and thus won't INSERT anything into the Tests table.

As for throwing back a message or error you can run a SELECT query against Tests using the variables to try and return the test_ID, if you get a result the INSERT was a success, if you don't it failed.