MySQL order by slow

MySQLorder-byperformancequery-performance

I have a table named 'customer' with the following structure and indexes:

DESCRIBE customer;
+--------------------+---------------+------+-----+-------------------+-----------------------------+
| Field              | Type          | Null | Key | Default           | Extra                       |
+--------------------+---------------+------+-----+-------------------+-----------------------------+
| id                 | int(11)       | NO   | PRI | NULL              | auto_increment              |
| email              | varchar(70)   | NO   | UNI | NULL              |                             |
| orders             | int(11)       | YES  |     | NULL              |                             |
| country            | text          | YES  |     | NULL              |                             |
| last_date_order    | datetime      | YES  |     | NULL              |                             |
| first_date_order   | datetime      | YES  |     | NULL              |                             |
| name               | varchar(100)  | YES  |     | NULL              |                             |
| lastname           | varchar(100)  | YES  |     | NULL              |                             |
| male               | tinyint(1)    | YES  |     | NULL              |                             |
| gender_probability | decimal(10,2) | YES  |     | NULL              |                             |
| phone              | varchar(20)   | YES  |     | NULL              |                             |
| phone_formatted    | varchar(20)   | YES  |     | NULL              |                             |
| created_at         | timestamp     | YES  |     | NULL              |                             |
| updated_at         | timestamp     | NO   | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------------+---------------+------+-----+-------------------+-----------------------------+

SHOW INDEX FROM customer;
 +----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| customer |          0 | PRIMARY    |            1 | id          | A         |     1899746 |     NULL | NULL   |      | BTREE      |         |               |
| customer |          0 | email      |            1 | email       | A         |     1899746 |     NULL | NULL   |      | BTREE      |         |               |
| customer |          1 | updated_at |            1 | updated_at  | A         |         200 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I can execute this query, which takes 15 seconds approximately:

SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00';
931817 rows in set (14.33 sec)

And this is the same query but adding an order by:

SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC;
931817 rows in set (5 min 21.17 sec)

I think 5 mins is too slow, the explain for the query shows:

EXPLAIN SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC;
+----+-------------+----------+------+---------------+------+---------+------+---------+-----------------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | Extra                       |
+----+-------------+----------+------+---------------+------+---------+------+---------+-----------------------------+
|  1 | SIMPLE      | customer | ALL  | updated_at    | NULL | NULL    | NULL | 1899744 | Using where; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+-----------------------------+

The profile says this:

SET profiling = 1;
SELECT * FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC;
SHOW PROFILE;
SET profiling = 0;

+------------------------------+------------+
| Status                       | Duration   |
+------------------------------+------------+
| Sending data                 |   0.036616 |
| Waiting for query cache lock |   0.000034 |
| Sending data                 |   0.088051 |
| Waiting for query cache lock |   0.000034 |
| Sending data                 |   0.064104 |
| Waiting for query cache lock |   0.000038 |
| Sending data                 |   0.045420 |
| Waiting for query cache lock |   0.000043 |
| Sending data                 |   0.035219 |
| Waiting for query cache lock |   0.000025 |
| Sending data                 |   0.049755 |
| Waiting for query cache lock |   0.000035 |
| Sending data                 |   0.069008 |
| Waiting for query cache lock |   0.000036 |
| Sending data                 |   0.068619 |
| Waiting for query cache lock |   0.000042 |
| Sending data                 |   0.067039 |
| Waiting for query cache lock |   0.000048 |
| Sending data                 |   0.051564 |
| Waiting for query cache lock |   0.000042 |
| Sending data                 |   0.052017 |
| Waiting for query cache lock |   0.000036 |
| Sending data                 |   0.049105 |
| Waiting for query cache lock |   0.000034 |
| Sending data                 |   0.060379 |
| Waiting for query cache lock |   0.000029 |
| Sending data                 |   0.118096 |
| Waiting for query cache lock |   0.000027 |
| Sending data                 |   0.046678 |
| Waiting for query cache lock |   0.000037 |
| Sending data                 |   0.058142 |
| Waiting for query cache lock |   0.000027 |
| Sending data                 |   0.137221 |
| Waiting for query cache lock |   0.000046 |
| Sending data                 |   0.061968 |
| Waiting for query cache lock |   0.000045 |
| Sending data                 |   0.054456 |
| Waiting for query cache lock |   0.000045 |
| Sending data                 |   0.054738 |
| Waiting for query cache lock |   0.000025 |
| Sending data                 |   0.073921 |
| Waiting for query cache lock |   0.000028 |
| Sending data                 |   0.040226 |
| Waiting for query cache lock |   0.000034 |
| Sending data                 |   0.036024 |
| Waiting for query cache lock |   0.000046 |
| Sending data                 |   0.063350 |
| Waiting for query cache lock |   0.000035 |
| Sending data                 |   0.044174 |
| Waiting for query cache lock |   0.000028 |
| Sending data                 |   0.057658 |
| Waiting for query cache lock |   0.000031 |
| Sending data                 |   0.081529 |
| Waiting for query cache lock |   0.000041 |
| Sending data                 |   0.074443 |
| Waiting for query cache lock |   0.000027 |
| Sending data                 |   0.058293 |
| Waiting for query cache lock |   0.000033 |
| Sending data                 |   0.061443 |
| Waiting for query cache lock |   0.000037 |
| Sending data                 |   0.075515 |
| Waiting for query cache lock |   0.000035 |
| Sending data                 |   0.075264 |
| Waiting for query cache lock |   0.000039 |
| Sending data                 |   0.049696 |
| Waiting for query cache lock |   0.000027 |
| Sending data                 |   0.052088 |
| Waiting for query cache lock |   0.000036 |
| Sending data                 |   0.049119 |
| Waiting for query cache lock |   0.000039 |
| Sending data                 |   0.053930 |
| Waiting for query cache lock |   0.000028 |
| Sending data                 |   0.064894 |
| Waiting for query cache lock |   0.000039 |
| Sending data                 |   0.046105 |
| Waiting for query cache lock |   0.000027 |
| Sending data                 |   0.061840 |
| Waiting for query cache lock |   0.000033 |
| Sending data                 |   0.065950 |
| Waiting for query cache lock |   0.000038 |
| Sending data                 |   0.053379 |
| Waiting for query cache lock |   0.000028 |
| Sending data                 |   0.034599 |
| Waiting for query cache lock |   0.000027 |
| Sending data                 |   0.046371 |
| Waiting for query cache lock |   0.000033 |
| Sending data                 |   0.039776 |
| Waiting for query cache lock |   0.000035 |
| Sending data                 |   0.046480 |
| Waiting for query cache lock |   0.000040 |
| Sending data                 |   0.039515 |
| Waiting for query cache lock |   0.000036 |
| Sending data                 | 405.206648 |
| end                          |   0.000040 |
| query end                    |   0.000011 |
| closing tables               |   0.000017 |
| freeing items                |   0.000036 |
| logging slow query           |   0.000006 |
| logging slow query           |   0.000011 |
| cleaning up                  |   0.000006 |
+------------------------------+------------+
100 rows in set (0.01 sec)

I would appreciate any thoughts on why the order by impacts the performance so much? 15s to 5mins+ is a large jump.

EXPLAIN for the query of RolandoMySQLDBA

SELECT B.* FROM (SELECT id FROM customer WHERE updated_at >= '2018-02-16 00:00:00' ORDER BY updated_at ASC) A LEFT JOIN customer B USING (id);
932016 rows in set (6 min 6.39 sec)

+----+-------------+------------+--------+---------------+------------+---------+------+--------+--------------------------+
| id | select_type | table      | type   | possible_keys | key        | key_len | ref  | rows   | Extra                    |
+----+-------------+------------+--------+---------------+------------+---------+------+--------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL       | NULL    | NULL | 932025 |                          |
|  1 | PRIMARY     | B          | eq_ref | PRIMARY       | PRIMARY    | 4       | A.id |      1 |                          |
|  2 | DERIVED     | customer   | range  | updated_at    | updated_at | 4       | NULL | 949956 | Using where; Using index |
+----+-------------+------------+--------+---------------+------------+---------+------+--------+--------------------------+

Best Answer

Based on what you have in the question

  • 1899746 rows in the table
  • 931817 rows > 2018-02-16 00:00:00
  • Index on updated_at has 200 distinct values

The EXPLAIN plan shows a full table scan and a sort of a temp table which has all columns. This makes sense since the WHERE clause sees 49.0496% of the table rows. It is easier to do a full table than figure out from the key distribution that accessing the index also requires accessing the table. This explains why the EXPLAIN plan did not choose the updated_at index.

The distribution of key values in the updated_at index can be seen by running this

SELECT COUNT(1) rowcount,updated_at FROM customer GROUP BY updated_at;

You could use the index to collect the ids first and then join them later

SELECT B.* FROM
(SELECT id FROM customer
WHERE updated_at >= '2018-02-16 00:00:00'
ORDER BY updated_at ASC) A
LEFT JOIN customer B USING (id);

If the subquery collecting the ids chooses the index, then the whole query may be fast. Please run the EXPLAIN on this and see if this is the case.