Handling Unknown Column Types at Creation Time in MySQL

MySQLschema

I have a table in a MySQL database that stores survey results.

CREATE TABLE survey_responses (
  response_id INTEGER UNSIGNED AUTO_INCREMENT,
  survey_id INTEGER UNSIGNED NOT NULL,
  question_id INTEGER UNSIGNED NOT NULL,
=>response WELL_IT_DEPENDS_REALLY,
  FOREIGN KEY (survey_id) REFERENCES surveys (survey_id),
  FOREIGN KEY (question_id) REFERENCES questions (question_id)
) ENGINE=InnoDB;

If the question is free form, a BLOB may be best for the response. If the question is "select the best answer", an INT pointing to that answer's record would be best. I'm sure I could store this in a document database and determine the type during runtime. But I would hate to maintain two databases just for this one column.

I've thought about having response_blob, response_int, response_char, and having them default to NULL. Then I would read the appropriate column at runtime. Is that acceptable or just a silly idea?

Is there a standard way this situation is handled?

Best Answer

Looks like you are describing a subtype/supertype structure.

Start by creating a reference table to hold the different types of response types you can create. This table can be expanded easily if you come up with more types later on.

CREATE TABLE survey_response_types (
  `response_type_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `type` VARCHAR(255) NOT NULL,
) ENGINE=InnoDB;

Modify your survey_responses table to use the response_type_id value instead of actually holding the response. This is the discriminator that lets you know what type of response you have when looking at or joining into the survey_responses table.

CREATE TABLE survey_responses (
  `response_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `survey_id` INTEGER UNSIGNED NOT NULL,
  `question_id` INTEGER UNSIGNED NOT NULL,
  `response_type_id` INTEGER UNSIGNED NOT NULL,
  FOREIGN KEY (`survey_id`) REFERENCES surveys (`survey_id`),
  FOREIGN KEY (`question_id`) REFERENCES questions (`question_id`),
  FOREIGN KEY (`response_type_id` REFERENCES `response_types` (`response_type_id`)
) ENGINE=InnoDB;

Then for each type of response, create another table, whose primary key is exactly the the same as the primary key of the survey_responses table. Also, make a foreign key constraint back to the survey_responses table. Do NOT create a new auto_increment key for this table, there is no need.

CREATE TABLE text_responses (
  `response_id` INTEGER UNSIGNED NOT NULL,
  `text` TEXT NOT NULL,
  FOREIGN KEY (`response_id`) REFERENCES `survey_responses` (`response_id`),
) ENGINE=InnoDB;

You mentioned linking back to the int of an answer, so here is how this table would look.

CREATE TABLE answer_responses (
  `response_id` INTEGER UNSIGNED NOT NULL,
  `answer_id` INTEGER UNSIGNED NOT NULL,
  FOREIGN KEY (`response_id`) REFERENCES `survey_responses` (`response_id`),
  FOREIGN KEY (`answer_id`) REFERENCES `answers` (`answer_id`)
) ENGINE=InnoDB;

Add as many more tables as there are possible types of answers.

When inserting the data, you should always insert an entry both into the survey_responses table, and then within the same transaction, insert an entry into the correct subtype table. MySQL does not support check constraints, so you'll have to handle the constraint code in the application layer to ensure that the response_type_id always matches correct subtype table that is used.

Together a subtype and supertype create a single 'record' that you can access. They should always exist in pairs. Never just a super type (survey_responses table record) nor just a subtype (which the foreign key constraint fortunately automatically prevents).

You can query them however you desire though. If you just want to make sure a record exists, simply query or join into the survey_responses table. If you want the actual response, construct your joins and queries appropriately.

Using this setup you can avoid having multiple null columns, and also protect against accidents where somehow a response gets two different types of responses. Although the lack of check constraints in MySQL will actually allow two types of responses in the subtype/supertype setup, the discriminator will let you identify the correct one when querying the data. (Any incorrect ones would be a result of bad coding in the application code.)