MySQL – Data Model for Online Examination Matching Questions

database-designMySQL

I'm designing a module for a client that allows them to administer certification examinations via their website to seminar attendees. I'm stuck on how to store the data for matching questions and keep the questions and answers in different tables.

First, if my data model doesn't make sense, please let me know. If there is a more elegant way to think about the problem, I want to know. What follows is part of the install method for the module, for the purpose of showing the additional tables and their columns.

    //Create test table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "test` (
      `test_id` int(11) NOT NULL AUTO_INCREMENT,
      `title` varchar(255) NOT NULL,
      `percent_to_pass` int(11) NOT NULL,   
      `attempts_allowed` int(11) NOT NULL,
      `time_limit_seconds` int(11) NOT NULL,
      `date_available` datetime NOT NULL,
      PRIMARY KEY (`test_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

    //Create products_to_test table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "product_to_test` (
      `product_to_test_id` int(11) NOT NULL AUTO_INCREMENT,
      `product_id` int(11) NOT NULL,
      `test_id` int(11) NOT NULL,
      PRIMARY KEY (`product_to_test_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

    //Create customer_to_test table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "customer_to_test` (
      `customer_to_test_id` int(11) NOT NULL AUTO_INCREMENT,
      `customer_id` int(11) NOT NULL,
      `test_id` int(11) NOT NULL,
      `attempts` int(11) NOT NULL, //Number of attempts at the exam
      `attempts_allowed` int(11) NOT NULL, //Will usually be 1, but will be incremented in the instance that the test is reset, allowing for multiple attempts upon approval.
      `percent_correct` int(11) NOT NULL, //Percentage of answers correct
      `responses` varchar(255) NOT NULL, //Serialized list of responses (question_id,answer_id;question_id,answer_id;
      `question_order` varchar(255) NOT NULL, //Serialized list of the questions/answers in the order presented for this exam (question_id,answer_1_id,answer_n_id,...;etc.)
      `passed` tinyint(1) NULL,  //True/False - Did the customer pass the exam?
      `date_taken` datetime NULL,  //Date and Time the exam was taken.
      PRIMARY KEY (`customer_to_test_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

    //Create test_questions table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "test_questions` (
      `question_id` int(11) NOT NULL AUTO_INCREMENT,
      `question_type` int(2) NOT NULL, //Indexed to question types: 1=Multiple Choice, 2=True/False, 3=Matching, 4=Short Answer/Fill in the Blank, 5=Essay
      `question_stem` varchar(255) NOT NULL, //If a matching question, then the stems will be serialized.
      `pre_question_stem_text` varchar(255) NULL,  //For text that applies to multiple questions, i.e. 101-104: Select the most appropriate graph
      `pre_question_stem_span` int(2) NULL, //Indicates how many questions the pre-question stem text should span (for presentation/indentation purposes)
      PRIMARY KEY (`question_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

    //Create question_to_test table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "question_to_test` (
      `question_to_test_id` int(11) NOT NULL AUTO_INCREMENT,
      `test_id` int(11) NOT NULL,
      `question_id` int(11) NOT NULL,
      PRIMARY KEY (`question_to_test_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

    //Create test_answers table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "test_answers` (
      `answer_id` int(11) NOT NULL AUTO_INCREMENT,
      `question_id` int(11) NOT NULL,
      `answer_text` varchar(255) NULL, //If a matching question, this will contain a serialized list of the answer choices. (choice_id,text;etc.)
      `answer_img` varchar(255) NULL,
      `correct` tinyint(1) NOT NULL,
      `correct_pairs` varchar(255) NULL, //If a matching question, then this will contain a serialized list of the correct pairs to compare the responses to.
      PRIMARY KEY (`answer_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

    //Create answer_to_question table
    $this->db->query("CREATE TABLE IF NOT EXISTS `" . DB_PREFIX . "answer_to_question` (
      `answer_to_question_id` int(11) NOT NULL AUTO_INCREMENT,
      `question_id` int(11) NOT NULL,
      `answer_id` int(11) NOT NULL,
      PRIMARY KEY (`answer_to_question_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;");

Now, this makes sense to me for multiple choice, true/false, and I can even make it work for essay questions. It feels like a lot of tables to accomplish something so simple, but it does have the advantage of keeping all the parts of the exam separate and easy to visualize. By serializing certain data depending on the question type, it works.

I've decided to serialize the stems and the answers for matching questions and store them inside the existing tables, and have added an additional field to the answers table to store the correct pairs, which can be read back when the exam is being graded.

I'm still interested in hearing thoughts from those with more experience. Thank you for your time and effort. I'm new to data design, and appreciate the help.

Best Answer

I've worked with a couple of education companies and it seems everyone has their own style. In general, however, you should probably consider the following patterns:

  • A test is just a series of questions, each question having a series of possible answers (multiple choice) or a single text input from the test taker.
  • There are two types of multiple choice questions: single answer (displayed in a radio button form) and multiple answer (check boxes).
  • Yes/No and True/False questions are just single answer multiple choice questions with two choices. In fact, your list of possible answers to display would probably start out with a list of these fixed answers: yes, no, true, false, agree, disagree, left, right, up, down, past, present, future, etc. These can be associated with templates so the test designer can select the needed template instead of entering "True" and "False" over and over again. Most answers are text but some could be images, photos, audio clips or any combination.

  • A question actually contains up to three parts:

    1. An optional intro. This sets up the question or gives needed background or context. "A traveling salesman starts out from city A..."
    2. The interrogation. "How fast will he need to drive to arrive in time for the scheduled meeting?"
    3. The answers. 40 mph, 50 mph, FTL, etc.
  • One intro can set up several sets of interrogation/answer pairs. There could be many questions concerning the traveling salesman, for example. The first question in the set would have the intro, the others would omit an intro. If there are a lot of these sets, you may want to move intro from an attribute of questions to a special type of question -- unanswered.

Freeform or fill-in-the-blank answers are not processed, just stored away for human grading. Sure, theoretically you could give your app the intelligence to read and evaluate such answers, but that wouldn't be written in database code anyway.

You also specify matching type questions. One of best ways of implementing this is to have the ability to group several questions together and associate them with the same set of answers. One attribute of this grouping would be "allow duplicates?" which specifies that the same answer may or may not be allowing more than once within the group.

I hope some of this helps. Test design can get very complex. It all depends on just how much flexibility you want to allow your users. The more flexibility, the easier it will be for your test designers to do a good job, but the more difficult your job as designer and developer will be. But you probably didn't take this job because you disliked challenges.


Oh, here is a general, best practices type of advice. Almost invariably, intersection tables like question_to_test do not need a separate primary key. It's superfluous, such a key is never used.

Think about it. Either you have the ID of a test and you want to see the questions assigned to it, or you have the ID of a question and you want to see the tests it is assigned to. Nowhere is the question_to_test_id value ever used.

The best PK for such a table is the combination of FKs used in the table. You generally don't want the same combination to appear multiple times. That is, the same question associated with the same test more than once.

This may seem trivial. OK, the key field is actually never used and a little extra space is wasted. So what?

For one, the combination of the two FKs should be defined as not null and unique. More extra work.

For another, some DBMSs (like MS SQL Server) allow one clustered index per table and that is used by the PK unless you specifically specify otherwise. So the real key for this table will be unclustered. A waste.