Postgresql – Is it ever a good idea to denormalize for integrity

database-designpostgresql

I'm using Postgres 9.2 to develop a quiz app, in which I present the user with a series of problems and record their answers.

These problems can take a variety of forms – they might be multiple choice (What's 2 + 2? A: 2. B: 3. C: 4), or they might require the user to calculate an answer of their own, in which case I need to constrain their input to be something like '440' or '1/2' or '.333'. Some of the problems might prompt the user to type in an essay. And, of course, I may need to add more types of problems later.

The tables I'm envisioning, in a simplified form, are something like this:

CREATE TABLE problems
(
  problem_id serial NOT NULL PRIMARY KEY,
  problem_type text NOT NULL, -- Refers to a lookup table
  answer_letter text, -- If not null, refers to the correct answer in the answers table below.
  response text -- If not null, represents a correct answer to be input, like '0.4'
);

CREATE TABLE answers
(
  problem_id integer, -- Foreign key
  answer_letter text,
  content text,

  CONSTRAINT answers_pkey PRIMARY KEY (problem_id, answer_letter)
)

CREATE TABLE questions
(
  user_id integer,
  created_at timestamptz,
  problem_id integer, -- Foreign key
  answer_letter text,
  response text,

  CONSTRAINT questions_pkey PRIMARY KEY (user_id, created_at)
);

So, the problems table would have a variety of constraints to ensure that:

  • When problem_type is 'multiple_choice', answer_letter must not be null and response must be null.
  • When problem_type is 'user_input', answer_letter must be null and response must not be null. Response must also consist of only a few characters.
  • When problem_type is 'essay', both answer_letter and response must be null, since I can't really have a correct answer for an essay question.

This is clean enough, and constrains the problems table just fine. I might use an enum instead of a lookup table for problem_type, since all of its possible values will already be baked into the schema anyway.

My difficulty is, how to constrain the questions table? Its constraints will be very similar (I don't want an answer_letter supplied for a question that references an essay problem, and so on). I can think of a few options:

  1. Create a unique index on problems (problem_id, problem_type), add a problem_type field to questions and include it in the reference to problems, then use it in check constraints in a way similar to problems. This is the way I'm leaning right now, since it seems cleanest overall, but then I'm denormalizing to achieve the proper constraints, which feels wrong to me.
  2. Create three problem tables, one for each type, and constrain them separately. Do the same with three question tables. This feels like the pure relational way to me, which is typically what I'd like to go for, but it also feels way too complex. I don't want to have to deal with unioning three tables (or more, later on) to get a user's question history.
  3. Go with #2, but use Postgres' inheritance support to try to keep the logic simple. But since you can't point a foreign key at a hierarchy of tables, again it's not a very clean solution, it's a solution that would again need to be hacked around.
  4. Use triggers to ensure that questions data fits the corresponding problem. Maybe it's because I don't have much experience with triggers, but I'm concerned that cramming that sort of imperative logic into the DB will eventually become unmanageable.
  5. Forget the question constraints, handle it in app logic, and hope for the best. You can't constrain everything all the time. Of course, I don't really like this idea either.

I feel like there's a problem with my approach to modeling that's leading me to these difficulties, especially since I'm running into a couple of cases very similar to this elsewhere in my schema (this one was just the easiest to describe). Maybe it's just a domain that's difficult to model, but I feel like there must be a better way, and I'm just not normalizing correctly.

Help? Thanks!

Best Answer

You are getting at something that is missing from standard normalization discussions, namely the constraint dependencies. In general wider tables provide greater possibilities here than narrower tables. So my view is that the sorts of questions you are asking in fact highlight good reasons to denormalize. I would go with your first solution (the one you are leaning towards right now).

In my view good database design generally normalizes as far as possible but ensures that all columns necessary for proper data constraints are included. You can do some of this with composite foreign keys if you don't mind adding additional unique constraints on the parent tables. Leveraging data constraints an important part of database design and one should not sacrifice that for the sake of normalization that looks good in theory.