MySQL Insert Performance – MySQL Stuck on INSERT INTO … SELECT

insertMySQL

I have a problem with my database. I have a production server running MySQL 5.5.33 and a local test server running MySQL 5.5.5. The only other difference I can think of is that the prod DB is being replicated

They both run the same exact SQL databases (local is a replica of the prod environment).

I need to create a materialized view in MySQL which will contain a subset of the record_status_events table. Below you will find the query that does this. This same exact query works in under a second on local, but halts on prod. It gets stuck in the "Sending data" status. If I export the data and import it manually it works fine. It is just the INSERT INTO … SELECT combo causing the problems.

The underlying query I am trying to insert runs under 150ms and has 50k entries.

I tried:

  • Changing the DB engine
  • Wrapping it in a transaction
  • Removing all indexes
  • Removing the primary key
  • Using the auto increment and ignoring the key from the source table
DROP TABLE IF EXISTS `hlh_record_status_submission`;
CREATE TABLE `hlh_record_status_submission` (
  `id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `record_id` int(10) unsigned NOT NULL,
  `status` tinyint(3) unsigned NOT NULL,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `Created At` (`created_at`),
  KEY `status` (`status`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `hlh_record_status_submission`
SELECT * FROM `hlh_record_status_events`
WHERE id IN (SELECT MIN(id) FROM `hlh_record_status_events`
                  WHERE status IN (21,24,31,40) GROUP BY record_id);

Best Answer

Give this formulation a try:

INSERT INTO `hlh_record_status_submission`
SELECT rse.* FROM `hlh_record_status_events` AS rse
    JOIN ( SELECT MIN(id) AS min_id
              FROM `hlh_record_status_events`
              WHERE status IN (21,24,31,40)
              GROUP BY record_id
         ) AS x  ON min_Id = rse.id;

and replace INDEX(status) with

INDEX(status, record_id, id)

Also, you should really change all your tables to InnoDB.

The subquery (a "derived" table) will be faster because of the index starting with what is in its WHERE clause and because that index is "covering". The JOINing to the outer part is (I assume) using the PRIMARY KEY(id) making that efficient.

Historically, the construct IN ( SELECT ... ) has been notoriously inefficient, so I try to avoid it. JOIN and EXISTS ( SELECT ... ) are sometimes viable alternatives.

More discussion of building optimal indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql