Postgresql – Code review database design for storing college assessments & enrolments

database-designpostgresql

I'm desigining a database for a university testing tool – running tests in the cloud, very similar to a CI system.

It needs to store users, courses, enrolments, assessments, submissions & results.

It's not a system powering the entire institution, but it is important for a few different courses. I'd like the database to be well-structured, extensible, and "intuitive" for future maintainers.

Requirements:

  • Two distinct courses may have the same name (but be run in different years/periods)
  • A single course can only be run once per period (of which there are multiple per year)
  • Assessment names must be unique within a course, but not across other courses.
  • A user may be enrolled in a course as an instructor, tutor, or student, but not multiple at once. A student may become a tutor for the same course in the future.

I've included my schema below, with relevant comments where I feel they're needed.

CREATE TYPE PERIOD          AS ENUM ('summer', 'first', 'autumn', 'winter', 'second', 'spring');
CREATE TYPE ASSESSMENT_TYPE AS ENUM ('assignment', 'lab');
CREATE TYPE TEST_RESULT     AS ENUM ('warnings', 'errors', 'perfect');
CREATE TYPE ROLE            AS ENUM ('student', 'tutor', 'convenor', 'admin');

CREATE TABLE users (
    uid      VARCHAR(20)  PRIMARY KEY,
    email    VARCHAR(255) UNIQUE NOT NULL,
    password BYTEA               NOT NULL
);

CREATE TABLE courses (
    -- id is used instead of (code, period, year) to
    -- uniquely identify a course, since its easier.
    id     SERIAL       PRIMARY KEY,
    -- code is used within the university to identify a course,
    -- but not when it was run (a single course can be run
    -- multiple times, with different students/assessments/etc)
    code   VARCHAR(20)  NOT NULL,
    name   VARCHAR(255) NOT NULL,
    period PERIOD       NOT NULL,
    year   INTEGER      NOT NULL,
    -- ensure that a single course can only be run once per period.
    CONSTRAINT once_per_period UNIQUE (code, period, year)
);

CREATE TABLE assessment ( 
    id           SERIAL          PRIMARY KEY,
    name         VARCHAR(255)    NOT NULL,  
    type         ASSESSMENT_TYPE NOT NULL,
    comments     TEXT,
    course_id    INTEGER         REFERENCES courses(id),
    -- assessment names must be unique within a course,
    -- but not across courses.
    CONSTRAINT unique_name_per_course UNIQUE (name, course_id)
);

CREATE TABLE test_results (
    id            SERIAL      PRIMARY KEY,
    result        TEST_RESULT NOT NULL,
    warnings      TEXT        NOT NULL,
    errors        TEXT        NOT NULL
);

CREATE TABLE submissions (
    id            SERIAL      PRIMARY KEY,
    title         TEXT        NOT NULL,
    description   TEXT        NOT NULL,
    feedback      TEXT        NOT NULL,
    uid           VARCHAR(20) REFERENCES users(uid),
    assessment_id INTEGER     REFERENCES assessment(id),
    result_id     INTEGER     REFERENCES test_results(id)
);

CREATE TABLE enrol (
    user_uid  VARCHAR(20) REFERENCES users(uid),
    course_id INTEGER     REFERENCES courses(id),
    role      ROLE        NOT NULL,
    -- ensure a user cannot have multiple roles for a single course
    CONSTRAINT one_role_per_course_run UNIQUE (user_uid, course_id)
);

I'd like some feedback on my design – what can I do better? Is there anything I missed? Any common/idiomatic design choices?

I was going to post in the code review stackexchange, but I saw a previous answer that mentioned this as a better location.

Cheers

Best Answer

Just a few thoughts.

I would not use ENUMs. I hate to say "always" (open to correction on this) but I would "always" use a FOREIGN KEY. Consider gender - 25 years ago, you could have had an ENUM ('M', 'F') - now you can have:

  • Unknown
  • Intersex
  • Non-binary
  • Decline-to-answer
  • Don't know
  • Don't care :-)

&c., &c...

With a FOREIGN KEY things are far easier to change into the future. Furthermore, you make the point that you want your database to be well-structured, extensible, and "intuitive" for future maintainers(*) - FOREIGN KEYs will do this by making your system portable to another RDBMS! After an FK, I'd go for a CHECK CONSTRAINT!

(*) upvoting for this alone! If only all system designers were so thoughtful - some of my sleepless nights as a programmer might have been avoided! :-)

Check out this thread on SO (both question and accepted answer from the same poster). It tries to go through all options (but misses one - see below!) and the "consensus" (if such a thing can be said to exist on these sites) is, FKs, followed by CCs. Check out this - there's a reason that poster has a rep > 300,000 (not to mention a shedload here too)!

Gotchas. MySQL (amazingly!) doesn't support CKs! Plus, another factor mitigating against CCs is that, frequently, DBAs and/or sysadmins are the only operators allowed to change system tables (which is what modifying CCs involves) whereas ordinary programmers/users can change ordinary tables.

A corollary to this is that if you want tight control over disk activity, you might want the smallest possible amount of it on your system tables - every operation involving a CC involves querying them, so the overhead of querying the FK tables is not such a big deal after all.

A factor in favour of CCs over ENUMs is that you can use the LIKE and other operators on them - you can't with the latter. Finally, take a look at the horrible syntax here to compare values in ENUMs to other values (remember they're different types!). And if you're doing lookups of values for, say, dropdowns, you'd have to use something like this:

select n.nspname as enum_schema,  
       t.typname as enum_name,  
       e.enumlabel as enum_value
from pg_type t 
   join pg_enum e on t.oid = e.enumtypid  
   join pg_catalog.pg_namespace n ON n.oid = t.typnamespace

as opposed to:

select my_field from my_table;

All in all, I would strongly urge you to use FKs or CCs to enforce Declarative Referential Integrity (DRI) on your tables.

There are (mentioned only for completeness) ways of restricting values in tables - you could use DOMAINs or TRIGGERs. The former seem superior to ENUMs, whereas TRIGGERs (IMHO) should only be used for complex business logic!

Finally, to answer the question (phew...).

Your schema seems fine - it's pretty difficult to judge these things by eye. What I would do if I were you, is to start using/developing against it and see if you come up against problems. Get back to us here with specific issues if you do - your question is a bit broad! p.s. welcome to the forum!