Hey there,
this is my first question so please be patient if I miss to add something in here.
My question is why the OR
in the WHERE
clause causes the query to be executed the way it is (which is poorly). There is some detailed information about my scenario below.
Server Version is ArchLinux MariaDB 10.1.35-1
While I was trying to improve the wall time of a query I noticed that the optimizer does some bad optimizing for a relatively simple query. In this simplified scenario I have two tables, person and contract. A contract has one or two affiliates (foreign key to person).
describe person;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| firstname | varchar(45) | YES | | NULL | |
| lastname | varchar(45) | YES | MUL | NULL | |
+-----------+-------------+------+-----+---------+----------------+
describe contract;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| state | varchar(45) | YES | MUL | NULL | |
| affiliate1 | int(11) | YES | MUL | NULL | |
| affiliate2 | int(11) | YES | | NULL | |
+------------+-------------+------+-----+---------+----------------+
There are indexes on (state), (state,affiliate1), (state,affiliate2), (state,affiliate1,affiliate2) and (lastname).
The query I was working on should summarize how many contracts are in which state, filtered by the lastname of either affiliate. For example I execute a query for the lastname 'Doe' I get something like
+-------------+----------+
| state | count(*) |
+-------------+----------+
| CREATED | 5 |
| IN PROGRESS | 1 |
| ACCEPTED | 2 |
+-------------+----------+
The initial query looked like this
SELECT state,count(*)
FROM contract
INNER JOIN person a1 ON a1.id=contract.affiliate1
LEFT OUTER JOIN person a2 ON a2.id=contract.affiliate2
WHERE a1.lastname=? OR a2.lastname=?
GROUP BY state;
I tested this query on a database with 1 million contracts, each having two distinct affiliates (so there are 2 million persons). This query took about 3 seconds to execute. If a left out one of a1.lastname=?
or a2.lastname=?
the query takes about 0.01 second. In the end I came up with using a UNION
to produce my result in a single query (which is very fast, too).
Here is the execution plan for the slow query I got from the slow query log:
# Query_time: 2.449397 Lock_time: 0.001430 Rows_sent: 1 Rows_examined: 3000000
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
# explain: 1 SIMPLE contract index UK_contract_affiliate1,FK_contract_affiliate2 id_a1_a2 13 NULL 902857 1000000.00 100.00 100.00 Using where; Using index
# explain: 1 SIMPLE a1 eq_ref PRIMARY,lastname PRIMARY 4 contract.affiliate1 1 1.00 100.00 100.00
# explain: 1 SIMPLE a2 eq_ref PRIMARY,lastname PRIMARY 4 contract.affiliate2 1 1.00 100.00 0.00 Using where
#
EDIT: As requested, the EXPLAIN
output
EXPLAIN SELECT contract.state
FROM contract
INNER JOIN person kn1 ON contract.id_affiliate1=affiliate1.id
WHERE affiliate1.lastname='ABC'
+------+-------------+----------+--------+------------------------------------------------------------+---------------+---------+----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+------------------------------------------------------------+---------------+---------+----------------------+------+--------------------------+
| 1 | SIMPLE | kn1 | ref | PRIMARY,lastname | lastname | 272 | const | 13 | Using where; Using index |
| 1 | SIMPLE | contract | eq_ref | UK_contract_affiliate1,state,affiliate1_type_status,affiliate2_type_status | UK_contract_affiliate1 | 4 | affiliate1.id | 1 | Using where |
+------+-------------+----------+--------+------------------------------------------------------------+---------------+---------+----------------------+------+--------------------------+
EXPLAIN SELECT contract.state
FROM contract
INNER JOIN person a1 ON contract.affiliate1=a1.id a1.lastname='ABC'
UNION ALL
SELECT contract.state
FROM contract
INNER JOIN person a2 ON contract.affiliate2=a2.id a2.lastname='ABC'
+------+-------------+------------+--------+----------------------------------------------------------------+------------------------+---------+----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+--------+----------------------------------------------------------------+------------------------+---------+----------------------+------+--------------------------+
| 1 | PRIMARY | affiliate1 | ref | PRIMARY,lastname | lastname | 272 | const | 1 | Using where; Using index |
| 1 | PRIMARY | contract | eq_ref | UK_contract_affiliate1,state,affiliate1_state,affiliate2_state | UK_contract_affiliate1 | 4 | affiliate1.id | 1 | Using where |
| 2 | UNION | affiliate2 | ref | PRIMARY,lastname | lastname | 272 | const | 1 | Using where; Using index |
| 2 | UNION | contract | ref | FK_contract_affiliate2,state,affiliate1_state,affiliate2_state | FK_contract_affiliate2 | 5 | affiliate2.id | 1 | Using where |
+------+-------------+------------+--------+----------------------------------------------------------------+------------------------+---------+----------------------+------+--------------------------+
Best Answer
OR
does not optimize well because it would involve searching for all possible rows with one of the things, then check for the other, then combine the two lists.Your original query is even worse since the things being OR'd together come from different tables. MySQL really punts on that. So it reads both tables entirely.
The
UNION
does what the Optimizer can't do; that is focus on one part of theOR
at a time.See my index cookbook