Mysql – Query performance with subquery and IN clause

MySQLperformancesubquery

I am trying to select a range of data for multiple devices (unique serial numbers) from a historical table and was wondering why there is such a big difference in time for the following queries:

Basically I am trying to use the IN clause to indicate what items I want to fetch data for. If I "hard code" the items in the IN clause, the query is fast, if I use a subquery or join to select the items the performance is poor.

This query completes in 0.15s and returns 7382 rows.

SELECT `readings`.* FROM `readings`
WHERE
  (SerialNumber IN ('091146000121', *snip 25*, '091146000556'))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC

The same query rewritten using a subquery to get the serial numbers takes over 30 seconds, and seems to spend most of its time in the Preparing state. It returns the same data as the first query.

SELECT `readings`.* FROM `readings`
WHERE
  (SerialNumber IN (SELECT `boards`.`id` AS `SerialNumber` FROM `boards` WHERE (siteId = '1')))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC

The subquery returns the same values that are in the first query, but as stated, this takes a lot longer to run. Are they not functionally equivalent?

Here is the explain for both queries:

+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra                       |
+----+-------------+----------+-------+---------------+---------+---------+------    +------+-----------------------------+
|  1 | SIMPLE      | readings | range | PRIMARY,time  | PRIMARY | 22      | NULL | 7339 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+

+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
| id | select_type        | table    | type            | possible_keys  | key     | key_len | ref  | rows    | Extra       |
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
|  1 | PRIMARY            | readings | range           | time           | time    | 4       | NULL | 6353234 | Using where |
|  2 | DEPENDENT SUBQUERY | boards   | unique_subquery | PRIMARY,siteId | PRIMARY | 18      | func |       1 | Using where |
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+

For some reason the query with the subselect isn't using the primary key. I tried using USE INDEX, but that actually made it take a lot longer.

The readings table has PRIMARY KEY SerialNumber, time with an index on time.
The boards table has PRIMARY KEY id (SerialNumber) and index on siteId.

The MySQL version I'm using is 5.5.8-log MySQL Community Server (GPL)

I'm just wondering why the performance of both queries isn't very similar. Thanks.

Update:
Here are the create table statements:

mysql> SHOW CREATE TABLE readings\G
*************************** 1. row ***************************
       Table: readings
Create Table: CREATE TABLE `readings` (
  `time` int(11) NOT NULL,
  `boxsn` varchar(16) NOT NULL,
  `rev` varchar(16) NOT NULL,
  `schema` tinyint(3) unsigned NOT NULL,
  `interval` smallint(5) unsigned NOT NULL,
  `relay` tinyint(4) NOT NULL,
  `inputV` decimal(10,6) NOT NULL,
  `inputA` decimal(10,6) NOT NULL,
  `outputV` decimal(10,6) NOT NULL,
  `outputA` decimal(10,6) NOT NULL,
  `phase` tinyint(4) NOT NULL,
  `outputVA` decimal(10,6) NOT NULL,
  `watts` decimal(10,6) NOT NULL DEFAULT '0.000000',
  `var` decimal(10,6) NOT NULL,
  `kiloVAHours` decimal(9,9) DEFAULT '0.000000000',
  `kilowattHours` decimal(9,9) NOT NULL,
  `kilovarHours` decimal(9,9) NOT NULL,
  PRIMARY KEY (`boxsn`,`time`),
  KEY `time` (`time`),
  KEY `boxsn_time_ndx` (`boxsn`,`time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE boards\G
*************************** 1. row ***************************
       Table: boards
Create Table: CREATE TABLE `boards` (
  `id` varchar(16) NOT NULL,
  `siteId` int(11) NOT NULL,
  `groupId` int(11) DEFAULT '0',
  `lastReport` int(11) DEFAULT NULL,
  `lastIp` varchar(15) DEFAULT '0.0.0.0',
  `label` varchar(24) DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `siteId` (`siteId`),
  KEY `siteId_id_ndx` (`siteId`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC

Best Answer

Refactor the query as follows:

SELECT
    readings.*
FROM
    (
        SELECT boxsn FROM readings
        WHERE (time >= 1325404800) 
        AND (time < 1326317400) 
        ORDER BY `time` ASC
    ) readings_keys
    LEFT JOIN
    (
        SELECT id AS boxsn FROM boards WHERE siteId = '1'
    ) boards
    USING (boxsn)
    LEFT JOIN readings
    USING (boxsn)
;

Make sure you have the following indexes:

ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);

You can drop the other index

ALTER TABLE readings DROP INDEX boxsn_time_ndx;

You should definitely see a dramatic improvement in performance as the tables grow.

In your case,

  • The first EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in readings against a list of value in memory
  • The second EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in readings against a table.

UPDATE 2012-01-12 14:03 EDT

I refactored it again to make sure the readings keys and boards keys are combined correctly before retrieving the data from the readings table:

SELECT 
    readings.* 
FROM 
    ( 
        SELECT A.* FROM
        (
            SELECT boxsn FROM readings 
            WHERE (time >= 1325404800)  
            AND (time < 1326317400)  
            ORDER BY `time` ASC
        ) A
        LEFT JOIN
        (
            SELECT id AS boxsn
            FROM boards
            WHERE siteId = '1'
        ) B
        USING (boxsn)
        WHERE B.boxsn IS NOT NULL
    ) readings_keys 
    LEFT JOIN readings 
    USING (boxsn) 
;