Have a look at these two queries:
Simple:
mysql> SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput = (SELECT DISTINCT lastoutput FROM omgenvelope WHERE lastinput=18658); +------------+-------+--------------------+-------------+---------+----------+ | idomgbatch | rutin | batchtime | idomgoutput | flowmod | omgbatch | +------------+-------+--------------------+-------------+---------+----------+ | 12174 | 8 | 20140508040930-832 | 10728 | 162 | 12174 | +------------+-------+--------------------+-------------+---------+----------+ 1 row in set (0.00 sec)
and
Complex:
mysql> SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput IN (SELECT DISTINCT lastoutput FROM omgenvelope WHERE lastinput=18658); +------------+-------+--------------------+-------------+---------+----------+ | idomgbatch | rutin | batchtime | idomgoutput | flowmod | omgbatch | +------------+-------+--------------------+-------------+---------+----------+ | 12174 | 8 | 20140508040930-832 | 10728 | 162 | 12174 | +------------+-------+--------------------+-------------+---------+----------+ 1 row in set (7.40 sec)
In this case I knew that the subquery returns only 1 value (12174
) so I could use the =
operator for the subquery result set and get the answer in 0.00 seconds.
When I keep the IN
operator, it takes a whole 7.4 seconds.
I have changed my code into a loop for the subquery and a simple =
in the "main" select, and all is well, but I'd still like to understand the mechanics behind this.
I'm too novice to understand all the details from the EXPLAIN
command.
The query with IN
lists three rows with select_type PRIMARY
, PRIMARY
, and DEPENDENT SUBQUERY
, with corresponding row counts 10172
, 1
, and 721
. Corresponding "keys": NULL
, PRIMARY
, fk_omgenvelope_omginput1_idx
.
The =
version that can only match 1 value instead does select_types PRIMARY
, PRIMARY
and SUBQUERY
with row counts 1
, 1
, and 721
.
There are 10114
rows in table omgoutput
, 1854182
rows in table omgenvelope
and 10108
rows in table omgbatch
, so I don't understand what the 10172
in the complex EXPLAIN
output comes from.
For completeness, here are the full EXPLAINations of the query plans:
Simple:
+----+-------------+-------------+-------+------------------------------------+------------------------------+---------+-------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+------------------------------------+------------------------------+---------+-------+------+------------------------------+ | 1 | PRIMARY | omgoutput | const | PRIMARY,fk_omgoutput_omgbatch1_idx | PRIMARY | 4 | const | 1 | | | 1 | PRIMARY | omgbatch | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 2 | SUBQUERY | omgenvelope | ref | fk_omgenvelope_omginput1_idx | fk_omgenvelope_omginput1_idx | 4 | | 721 | Using where; Using temporary | +----+-------------+-------------+-------+------------------------------------+------------------------------+---------+-------+------+------------------------------+
Complex:
+----+--------------------+-------------+--------+------------------------------------------------------------+------------------------------+---------+-----------------------------+-------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+--------+------------------------------------------------------------+------------------------------+---------+-----------------------------+-------+------------------------------+ | 1 | PRIMARY | omgoutput | ALL | fk_omgoutput_omgbatch1_idx | NULL | NULL | NULL | 10173 | Using where | | 1 | PRIMARY | omgbatch | eq_ref | PRIMARY | PRIMARY | 4 | tracksdb.omgoutput.omgbatch | 1 | | | 2 | DEPENDENT SUBQUERY | omgenvelope | ref | fk_omgenvelope_omginput1_idx,fk_omgenvelope_omgoutput1_idx | fk_omgenvelope_omginput1_idx | 4 | const | 721 | Using where; Using temporary | +----+--------------------+-------------+--------+------------------------------------------------------------+------------------------------+---------+-----------------------------+-------+------------------------------+
How shall I think when parsing this? What is the difference between DEPENDENT SUBQUERY
and just SUBQUERY
without dependent?
It doesn't seem to be just the IN
and more than 1 potential row in the subquery that does it. Replacing the subquery with several constant values is also quick:
mysql> explain SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput IN (10728,10729,10000,10,100,0,23123); +----+-------------+-----------+--------+------------------------------------+---------+---------+-----------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+--------+------------------------------------+---------+---------+-----------------------------+------+-------------+ | 1 | SIMPLE | omgoutput | range | PRIMARY,fk_omgoutput_omgbatch1_idx | PRIMARY | 4 | NULL | 7 | Using where | | 1 | SIMPLE | omgbatch | eq_ref | PRIMARY | PRIMARY | 4 | tracksdb.omgoutput.omgbatch | 1 | | +----+-------------+-----------+--------+------------------------------------+---------+---------+-----------------------------+------+-------------+
Best Answer
According to this answer:
This is a known problem in MySQL. Rumors say it will be fixed.
The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.
This causes the subquery to be run a huge number of times, which is unnecessary.