Mysql – Why does IN (subquery) perform bad when = (subquery) is blazing fast

innodbMySQLsubquery

Have a look at these two queries:

Simple:

mysql> SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput = (SELECT DISTINCT lastoutput FROM omgenvelope WHERE lastinput=18658);
+------------+-------+--------------------+-------------+---------+----------+
| idomgbatch | rutin | batchtime          | idomgoutput | flowmod | omgbatch |
+------------+-------+--------------------+-------------+---------+----------+
|      12174 |     8 | 20140508040930-832 |       10728 |     162 |    12174 |
+------------+-------+--------------------+-------------+---------+----------+
1 row in set (0.00 sec)

and

Complex:

mysql> SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput IN (SELECT DISTINCT lastoutput FROM omgenvelope WHERE lastinput=18658);
+------------+-------+--------------------+-------------+---------+----------+
| idomgbatch | rutin | batchtime          | idomgoutput | flowmod | omgbatch |
+------------+-------+--------------------+-------------+---------+----------+
|      12174 |     8 | 20140508040930-832 |       10728 |     162 |    12174 |
+------------+-------+--------------------+-------------+---------+----------+
1 row in set (7.40 sec)

In this case I knew that the subquery returns only 1 value (12174) so I could use the = operator for the subquery result set and get the answer in 0.00 seconds.

When I keep the IN operator, it takes a whole 7.4 seconds.
I have changed my code into a loop for the subquery and a simple = in the "main" select, and all is well, but I'd still like to understand the mechanics behind this.

I'm too novice to understand all the details from the EXPLAIN command.

The query with IN lists three rows with select_type PRIMARY, PRIMARY, and DEPENDENT SUBQUERY, with corresponding row counts 10172, 1, and 721. Corresponding "keys": NULL, PRIMARY, fk_omgenvelope_omginput1_idx.

The = version that can only match 1 value instead does select_types PRIMARY, PRIMARY and SUBQUERY with row counts 1, 1, and 721.

There are 10114 rows in table omgoutput, 1854182 rows in table omgenvelope and 10108 rows in table omgbatch, so I don't understand what the 10172 in the complex EXPLAIN output comes from.

For completeness, here are the full EXPLAINations of the query plans:

Simple:

+----+-------------+-------------+-------+------------------------------------+------------------------------+---------+-------+------+------------------------------+
| id | select_type | table       | type  | possible_keys                      | key                          | key_len | ref   | rows | Extra                        |
+----+-------------+-------------+-------+------------------------------------+------------------------------+---------+-------+------+------------------------------+
|  1 | PRIMARY     | omgoutput   | const | PRIMARY,fk_omgoutput_omgbatch1_idx | PRIMARY                      | 4       | const |    1 |                              |
|  1 | PRIMARY     | omgbatch    | const | PRIMARY                            | PRIMARY                      | 4       | const |    1 |                              |
|  2 | SUBQUERY    | omgenvelope | ref   | fk_omgenvelope_omginput1_idx       | fk_omgenvelope_omginput1_idx | 4       |       |  721 | Using where; Using temporary |
+----+-------------+-------------+-------+------------------------------------+------------------------------+---------+-------+------+------------------------------+

Complex:

+----+--------------------+-------------+--------+------------------------------------------------------------+------------------------------+---------+-----------------------------+-------+------------------------------+
| id | select_type        | table       | type   | possible_keys                                              | key                          | key_len | ref                         | rows  | Extra                        |
+----+--------------------+-------------+--------+------------------------------------------------------------+------------------------------+---------+-----------------------------+-------+------------------------------+
|  1 | PRIMARY            | omgoutput   | ALL    | fk_omgoutput_omgbatch1_idx                                 | NULL                         | NULL    | NULL                        | 10173 | Using where                  |
|  1 | PRIMARY            | omgbatch    | eq_ref | PRIMARY                                                    | PRIMARY                      | 4       | tracksdb.omgoutput.omgbatch |     1 |                              |
|  2 | DEPENDENT SUBQUERY | omgenvelope | ref    | fk_omgenvelope_omginput1_idx,fk_omgenvelope_omgoutput1_idx | fk_omgenvelope_omginput1_idx | 4       | const                       |   721 | Using where; Using temporary |
+----+--------------------+-------------+--------+------------------------------------------------------------+------------------------------+---------+-----------------------------+-------+------------------------------+

How shall I think when parsing this? What is the difference between DEPENDENT SUBQUERY and just SUBQUERY without dependent?

It doesn't seem to be just the IN and more than 1 potential row in the subquery that does it. Replacing the subquery with several constant values is also quick:

mysql> explain SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput IN (10728,10729,10000,10,100,0,23123);
+----+-------------+-----------+--------+------------------------------------+---------+---------+-----------------------------+------+-------------+
| id | select_type | table     | type   | possible_keys                      | key     | key_len | ref                         | rows | Extra       |
+----+-------------+-----------+--------+------------------------------------+---------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | omgoutput | range  | PRIMARY,fk_omgoutput_omgbatch1_idx | PRIMARY | 4       | NULL                        |    7 | Using where |
|  1 | SIMPLE      | omgbatch  | eq_ref | PRIMARY                            | PRIMARY | 4       | tracksdb.omgoutput.omgbatch |    1 |             |
+----+-------------+-----------+--------+------------------------------------+---------+---------+-----------------------------+------+-------------+

Best Answer

According to this answer:

This is a known problem in MySQL. Rumors say it will be fixed.

The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.

This causes the subquery to be run a huge number of times, which is unnecessary.