I have a table named 'customer' with the following structure and indexes:
DESCRIBE customer;
+--------------------+---------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| email | varchar(70) | NO | UNI | NULL | |
| orders | int(11) | YES | | NULL | |
| country | text | YES | | NULL | |
| last_date_order | datetime | YES | | NULL | |
| first_date_order | datetime | YES | | NULL | |
| name | varchar(100) | YES | | NULL | |
| lastname | varchar(100) | YES | | NULL | |
| male | tinyint(1) | YES | | NULL | |
| gender_probability | decimal(10,2) | YES | | NULL | |
| phone | varchar(20) | YES | | NULL | |
| phone_formatted | varchar(20) | YES | | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------------+---------------+------+-----+-------------------+-----------------------------+
SHOW INDEX FROM customer;
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| customer | 0 | PRIMARY | 1 | id | A | 1899746 | NULL | NULL | | BTREE | | |
| customer | 0 | email | 1 | email | A | 1899746 | NULL | NULL | | BTREE | | |
| customer | 1 | updated_at | 1 | updated_at | A | 200 | NULL | NULL | | BTREE | | |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I can execute this query, which takes 15 seconds approximately:
SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00';
931817 rows in set (14.33 sec)
And this is the same query but adding an order by
:
SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC;
931817 rows in set (5 min 21.17 sec)
I think 5 mins is too slow, the explain for the query shows:
EXPLAIN SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC;
+----+-------------+----------+------+---------------+------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | customer | ALL | updated_at | NULL | NULL | NULL | 1899744 | Using where; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+-----------------------------+
The profile says this:
SET profiling = 1;
SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC;
SHOW PROFILE;
SET profiling = 0;
+------------------------------+------------+
| Status | Duration |
+------------------------------+------------+
| Sending data | 0.036616 |
| Waiting for query cache lock | 0.000034 |
| Sending data | 0.088051 |
| Waiting for query cache lock | 0.000034 |
| Sending data | 0.064104 |
| Waiting for query cache lock | 0.000038 |
| Sending data | 0.045420 |
| Waiting for query cache lock | 0.000043 |
| Sending data | 0.035219 |
| Waiting for query cache lock | 0.000025 |
| Sending data | 0.049755 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 0.069008 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 0.068619 |
| Waiting for query cache lock | 0.000042 |
| Sending data | 0.067039 |
| Waiting for query cache lock | 0.000048 |
| Sending data | 0.051564 |
| Waiting for query cache lock | 0.000042 |
| Sending data | 0.052017 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 0.049105 |
| Waiting for query cache lock | 0.000034 |
| Sending data | 0.060379 |
| Waiting for query cache lock | 0.000029 |
| Sending data | 0.118096 |
| Waiting for query cache lock | 0.000027 |
| Sending data | 0.046678 |
| Waiting for query cache lock | 0.000037 |
| Sending data | 0.058142 |
| Waiting for query cache lock | 0.000027 |
| Sending data | 0.137221 |
| Waiting for query cache lock | 0.000046 |
| Sending data | 0.061968 |
| Waiting for query cache lock | 0.000045 |
| Sending data | 0.054456 |
| Waiting for query cache lock | 0.000045 |
| Sending data | 0.054738 |
| Waiting for query cache lock | 0.000025 |
| Sending data | 0.073921 |
| Waiting for query cache lock | 0.000028 |
| Sending data | 0.040226 |
| Waiting for query cache lock | 0.000034 |
| Sending data | 0.036024 |
| Waiting for query cache lock | 0.000046 |
| Sending data | 0.063350 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 0.044174 |
| Waiting for query cache lock | 0.000028 |
| Sending data | 0.057658 |
| Waiting for query cache lock | 0.000031 |
| Sending data | 0.081529 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 0.074443 |
| Waiting for query cache lock | 0.000027 |
| Sending data | 0.058293 |
| Waiting for query cache lock | 0.000033 |
| Sending data | 0.061443 |
| Waiting for query cache lock | 0.000037 |
| Sending data | 0.075515 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 0.075264 |
| Waiting for query cache lock | 0.000039 |
| Sending data | 0.049696 |
| Waiting for query cache lock | 0.000027 |
| Sending data | 0.052088 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 0.049119 |
| Waiting for query cache lock | 0.000039 |
| Sending data | 0.053930 |
| Waiting for query cache lock | 0.000028 |
| Sending data | 0.064894 |
| Waiting for query cache lock | 0.000039 |
| Sending data | 0.046105 |
| Waiting for query cache lock | 0.000027 |
| Sending data | 0.061840 |
| Waiting for query cache lock | 0.000033 |
| Sending data | 0.065950 |
| Waiting for query cache lock | 0.000038 |
| Sending data | 0.053379 |
| Waiting for query cache lock | 0.000028 |
| Sending data | 0.034599 |
| Waiting for query cache lock | 0.000027 |
| Sending data | 0.046371 |
| Waiting for query cache lock | 0.000033 |
| Sending data | 0.039776 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 0.046480 |
| Waiting for query cache lock | 0.000040 |
| Sending data | 0.039515 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 405.206648 |
| end | 0.000040 |
| query end | 0.000011 |
| closing tables | 0.000017 |
| freeing items | 0.000036 |
| logging slow query | 0.000006 |
| logging slow query | 0.000011 |
| cleaning up | 0.000006 |
+------------------------------+------------+
100 rows in set (0.01 sec)
I would appreciate any thoughts on why the order by
impacts the performance so much? 15s to 5mins+ is a large jump.
EXPLAIN for the query of RolandoMySQLDBA
SELECT B.* FROM (SELECT id FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC) A LEFT JOIN customer B USING (id);
932016 rows in set (6 min 6.39 sec)
+----+-------------+------------+--------+---------------+------------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+------------+---------+------+--------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 932025 | |
| 1 | PRIMARY | B | eq_ref | PRIMARY | PRIMARY | 4 | A.id | 1 | |
| 2 | DERIVED | customer | range | updated_at | updated_at | 4 | NULL | 949956 | Using where; Using index |
+----+-------------+------------+--------+---------------+------------+---------+------+--------+--------------------------+
Best Answer
Based on what you have in the question
1899746
rows in the table931817
rows >2018-02-16 00:00:00
updated_at
has 200 distinct valuesThe EXPLAIN plan shows a full table scan and a sort of a temp table which has all columns. This makes sense since the WHERE clause sees 49.0496% of the table rows. It is easier to do a full table than figure out from the key distribution that accessing the index also requires accessing the table. This explains why the EXPLAIN plan did not choose the
updated_at
index.The distribution of key values in the
updated_at
index can be seen by running thisYou could use the index to collect the ids first and then join them later
If the subquery collecting the ids chooses the index, then the whole query may be fast. Please run the EXPLAIN on this and see if this is the case.