Mysql – Does MySQL have problems with nested insert like with subqueries in where

MySQLperformancesubquery

Query 1:

INSERT `personal`.`locations` 
  SELECT DISTINCT `s`.* 
    FROM `references` `t`
      JOIN `locations` `s` ON `first_id` = `s`.`id` 
    WHERE 
      `lat` >= 37.3
      AND `lat` <= 37.3
      AND `lng` >= -122.2 
      AND `lng` <= -122.1 
ON DUPLICATE KEY UPDATE 
  `lat`  = `s`.`lat`,
  `lng`  = `s`.`lng`,
  `name` = `s`.`name`,
  `desr` = `s`.`desr`;

Query 2:

INSERT `personal`.`locations` 
  SELECT DISTINCT `s`.* 
    FROM `references` `t`
      JOIN `locations` `s` ON (`first_id` = `s`.`id` OR `second_id` = `s`.`id`) 
    WHERE 
      `lat` >= 37.3
      AND `lat` <= 37.3
      AND `lng` >= -122.2 
      AND `lng` <= -122.1 
ON DUPLICATE KEY UPDATE 
  `lat`  = `s`.`lat`,
  `lng`  = `s`.`lng`,
  `name` = `s`.`name`,
  `desr` = `s`.`desr`;

The select in query 1 takes 0.008 seconds to select 4 million records.

The select in query 2 takes 1 second to select 300 thousand records.

Query 1 executes completely in 60 seconds.

Query 2 executes completely in 300 seconds.

The conclusion: MySQL seems to repeat the select for every insert just like with where clause subqueries.

Is there a solution?


Edit 1: Added new query

Query 2: faster alternative but still with the same issue

INSERT `personal`.`locations` 
  SELECT DISTINCT `s`.* 
    FROM `references` `t`
      JOIN `locations` `s` ON `first_id` = `s`.`id` 
    WHERE 
      `lat` >= 37.3
      AND `lat` <= 37.3
      AND `lng` >= -122.2 
      AND `lng` <= -122.1 
UNION ALL
  SELECT DISTINCT `s`.* 
    FROM `references` `t`
      JOIN `locations` `s` ON `second_id` = `s`.`id` 
    WHERE 
      `lat` >= 37.3
      AND `lat` <= 37.3
      AND `lng` >= -122.2 
      AND `lng` <= -122.1 
ON DUPLICATE KEY UPDATE 
  `lat`  = `s`.`lat`,
  `lng`  = `s`.`lng`,
  `name` = `s`.`name`,
  `desr` = `s`.`desr`;

Slightly faster despite the fact it actually executes more updates but since the two selects execute faster the the one they replace it gains a bit of time.


Edit 2: Added table structure

CREATE TABLE IF NOT EXISTS `references` (
  `id`        bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `first_id`  bigint(20) unsigned NOT NULL DEFAULT '0',
  `second_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `name`      varchar(255) NOT NULL,
  `status`    enum('V','I','D') NOT NULL DEFAULT 'V',
  PRIMARY KEY (`id`),
  KEY `first_id`  (`first_id`),
  KEY `second_id` (`second_id`),
  KEY `status`    (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `locations` (
  `id`        bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `lat`       double NOT NULL DEFAULT '0',
  `lng`       double NOT NULL DEFAULT '0',
  `name`      varchar(40) NOT NULL DEFAULT '0',
  `desr`      varchar(254) NOT NULL DEFAULT '0',
  `status`    enum('V','I','D') NOT NULL DEFAULT 'V',
  PRIMARY KEY (`id`),
  KEY `lat`    (`lat`),
  KEY `lng`    (`lng`)
  KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Each reference has to have at least one location associated based on first_id. The second is optional but it does exist for 70%-80% of the records. Each location can be associated with multiple references.

Best Answer

Using OR in the JOIN part of your query results in the query to be performed for all 4 million records returned from the result set of the left hand side OR clause.

So using a solution with UNION is definitely faster.