Mysql – Querying data from a pivoted MySQL table takes a long time to execute

MySQLpivot

TL;DR
Querying data from a two table database I've setup a long time ago takes a really long time to execute.

Back when I was a student, quite some time ago, I developed a simple survey tool for a college project. I later launched it as a free service for fellow students to use. Now, many, many years later, my survey tool still gets used by students. I kind off neglected the tool for long time until I recently tried it again and noticed it has become painstakingly slow when viewing survey results. Oops. There are much beter survey tools/services available now. However, the collected survey data is still valuable to many people I guess. I wrote it in PHP with a MySQL database. The database contains well over 6,000 different surveys.

I am familiar with MySQL, but I am by no means a database expert. I certainly wasn't back then and I suspect I made some poor choices structuring the database.

Depending on the number if participants to a survey, running a query to filter results from a single survey can take up to 12 seconds(!). I upgraded to a better server, but this didn't helped as much as I hoped it would. It only slightly improved the query time back from 18 to 12 seconds.

Because of the required flexibility (every survey has different questions) I setup two database tables. On the main table all the fixed data gets inserted, such as unique record ID, timestamp, users IP address, ect. The second table is used insert all the answers on; one record for each answer on each question. To get all the answers the seconds table gets 'pivoted' using Case statement in the SQL query (shown below). I think this might not have been the most efficient setup, and maybe this is the reason to why filtering results is really slow.

So, this is how I structured the database.

The main table with fixed data (form_submits)

+--------------------+--------------+------+-----+---------+----------------+
| Field              | Type         | Null | Key | Default | Extra          |
+--------------------+--------------+------+-----+---------+----------------+
| id                 | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| formid             | mediumint(9) | NO   | MUL | NULL    |                |
| email              | varchar(255) | NO   | MUL | NULL    |                |
| status             | tinyint(1)   | NO   |     | NULL    |                |
| submit_date        | int(11)      | NO   |     | NULL    |                |
| submit_ip          | varchar(46)  | NO   |     | NULL    |                |
| submit_ua          | text         | NO   |     | NULL    |                |
| confirm_date       | int(11)      | NO   |     | NULL    |                |
| confirm_ip         | varchar(46)  | NO   |     | NULL    |                |
| notifications_send | tinyint(4)   | NO   |     | NULL    |                |
| notification_last  | int(11)      | NO   |     | NULL    |                |
+--------------------+--------------+------+-----+---------+----------------+

The data table with the questions and answers (form_submits_data)

+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| submit_id | mediumint(9) | NO   | PRI | NULL    |       |
| form_id   | int(11)      | NO   | MUL | NULL    |       |
| field_id  | mediumint(9) | NO   | PRI | NULL    |       |
| question  | text         | NO   |     | NULL    |       |
| answer    | text         | NO   |     | NULL    |       |
+-----------+--------------+------+-----+---------+-------+

If, for example, I want to get all the answers from participants that live in a certain postcode area this query is used:

SELECT 
fs.id, 
fs.email,
coalesce(MAX(case when field_id = 10 then answer end), 0) as 'C10',
coalesce(MAX(case when field_id = 11 then answer end), 0) as 'C11',
coalesce(MAX(case when field_id = 12 then answer end), 0) as 'C12',
coalesce(MAX(case when field_id = 15 then answer end), 0) as 'C15',
coalesce(MAX(case when field_id = 16 then answer end), 0) as 'C16',
coalesce(MAX(case when field_id = 17 then answer end), 0) as 'C17',
coalesce(MAX(case when field_id = 21 then answer end), 0) as 'C21',
coalesce(MAX(case when field_id = 32 then answer end), 0) as 'C32',
coalesce(MAX(case when field_id = 36 then answer end), 0) as 'C36',
coalesce(MAX(case when field_id = 38 then answer end), 0) as 'C38',
coalesce(MAX(case when field_id = 340 then answer end), 0) as 'C340',
coalesce(MAX(case when field_id = 341 then answer end), 0) as 'C341',
coalesce(MAX(case when field_id = 342 then answer end), 0) as 'C342',
coalesce(MAX(case when field_id = 343 then answer end), 0) as 'C343',
coalesce(MAX(case when field_id = 344 then answer end), 0) as 'C344'
FROM `form_submits` fs
LEFT JOIN `form_submits_data` fsd ON fsd.submit_id = fs.id
WHERE fs.form_id = 45023 
AND fs.status = 1
GROUP BY fs.id
HAVING C32 LIKE 'HA0 1AA%'
ORDER BY fs.submit_date DESC

Which on a survey with 21,000 responses takes 12 seconds to execute. I feel this can't be right and thats this is probably because I structured the database poorly. This makes me wonder, what should I have done differently? What setup/structure would be more efficient while retaining flexibility?

Best Answer

I wouldn't say you did anything wrong. Humorously, I just suggested to someone to do the same thing you did for a similar problem:

How should I model a database structure to retain calculation results?

You might want to explore a semi-structured table format where instead of your "data" table having so many rows, you add a column to the "main" table that stores all the answers in a JSON object that gets parsed at query time. This is relatively new in MySQL but PostgreSQL and SQL Server are a little farther along.

In other words, after a survey on your favorite fruit and favorite color, your main table would have an answer column that looked like this:

{"f1":"apples","f2":"blue"}

and you would parse it at query time into two columns called f1 and f2. There is some obvious overhead associated with parsing the JSON, but on a VM running on my laptop it was about .5 secs for 100K rows.

Related Solutions

Mysql – find and insert row to another table using thesql trigger

This should do the trick for you:

DELIMITER $$
DROP TRIGGER IF EXISTS `employee_INSERT` $$
CREATE TRIGGER `employee_INSERT` 
AFTER INSERT ON `employee`
FOR EACH ROW
BEGIN
    INSERT INTO employee_tools (Id, Tool)
    SELECT new.Id, tools.Tool_Name 
            FROM tools
            WHERE tools.Division = new.Division;
END $$
DELIMITER ;

Handling Unknown Column Types at Creation Time in MySQL

Looks like you are describing a subtype/supertype structure.

Start by creating a reference table to hold the different types of response types you can create. This table can be expanded easily if you come up with more types later on.

CREATE TABLE survey_response_types (
  `response_type_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `type` VARCHAR(255) NOT NULL,
) ENGINE=InnoDB;

Modify your survey_responses table to use the response_type_id value instead of actually holding the response. This is the discriminator that lets you know what type of response you have when looking at or joining into the survey_responses table.

CREATE TABLE survey_responses (
  `response_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `survey_id` INTEGER UNSIGNED NOT NULL,
  `question_id` INTEGER UNSIGNED NOT NULL,
  `response_type_id` INTEGER UNSIGNED NOT NULL,
  FOREIGN KEY (`survey_id`) REFERENCES surveys (`survey_id`),
  FOREIGN KEY (`question_id`) REFERENCES questions (`question_id`),
  FOREIGN KEY (`response_type_id` REFERENCES `response_types` (`response_type_id`)
) ENGINE=InnoDB;

Then for each type of response, create another table, whose primary key is exactly the the same as the primary key of the survey_responses table. Also, make a foreign key constraint back to the survey_responses table. Do NOT create a new auto_increment key for this table, there is no need.

CREATE TABLE text_responses (
  `response_id` INTEGER UNSIGNED NOT NULL,
  `text` TEXT NOT NULL,
  FOREIGN KEY (`response_id`) REFERENCES `survey_responses` (`response_id`),
) ENGINE=InnoDB;

You mentioned linking back to the int of an answer, so here is how this table would look.

CREATE TABLE answer_responses (
  `response_id` INTEGER UNSIGNED NOT NULL,
  `answer_id` INTEGER UNSIGNED NOT NULL,
  FOREIGN KEY (`response_id`) REFERENCES `survey_responses` (`response_id`),
  FOREIGN KEY (`answer_id`) REFERENCES `answers` (`answer_id`)
) ENGINE=InnoDB;

Add as many more tables as there are possible types of answers.

When inserting the data, you should always insert an entry both into the survey_responses table, and then within the same transaction, insert an entry into the correct subtype table. MySQL does not support check constraints, so you'll have to handle the constraint code in the application layer to ensure that the response_type_id always matches correct subtype table that is used.

Together a subtype and supertype create a single 'record' that you can access. They should always exist in pairs. Never just a super type (survey_responses table record) nor just a subtype (which the foreign key constraint fortunately automatically prevents).

You can query them however you desire though. If you just want to make sure a record exists, simply query or join into the survey_responses table. If you want the actual response, construct your joins and queries appropriately.

Using this setup you can avoid having multiple null columns, and also protect against accidents where somehow a response gets two different types of responses. Although the lack of check constraints in MySQL will actually allow two types of responses in the subtype/supertype setup, the discriminator will let you identify the correct one when querying the data. (Any incorrect ones would be a result of bad coding in the application code.)

Best Answer

Related Solutions

Mysql – find and insert row to another table using thesql trigger

Handling Unknown Column Types at Creation Time in MySQL

Related Question