MySQL Index Optimization – Why MySQL Ignores Index Even on FORCE for ORDER BY

explainindexinnodbMySQLoptimization

I run an EXPLAIN:

mysql> explain select last_name from employees order by last_name;
+----+-------------+-----------+------+---------------+------+---------+------+-------+----------------+  
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows  | Extra          |
+----+-------------+-----------+------+---------------+------+---------+------+-------+----------------+  
|  1 | SIMPLE      | employees | ALL  | NULL          | NULL | NULL    | NULL | 10031 | Using filesort |
+----+-------------+-----------+------+---------------+------+---------+------+-------+----------------+  
1 row in set (0.00 sec)  

The indexes in my table:

mysql> show index from employees;  
+-----------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+  
| Table     | Non_unique | Key_name      | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |  
+-----------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+  
| employees |          0 | PRIMARY       |            1 | subsidiary_id | A         |           6 |     NULL | NULL   |      | BTREE      |         |               |  
| employees |          0 | PRIMARY       |            2 | employee_id   | A         |       10031 |     NULL | NULL   |      | BTREE      |         |               |  
| employees |          1 | idx_last_name |            1 | last_name     | A         |       10031 |      700 | NULL   |      | BTREE      |         |               |  
| employees |          1 | date_of_birth |            1 | date_of_birth | A         |       10031 |     NULL | NULL   | YES  | BTREE      |         |               |  
| employees |          1 | date_of_birth |            2 | subsidiary_id | A         |       10031 |     NULL | NULL   |      | BTREE      |         |               |  
+-----------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+  
5 rows in set (0.02 sec)  

There is an index on last_name but the optimizer does not use it.
So I do:

mysql> explain select last_name from employees force index(idx_last_name) order by last_name;  
+----+-------------+-----------+------+---------------+------+---------+------+-------+----------------+  
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows  | Extra          |  
+----+-------------+-----------+------+---------------+------+---------+------+-------+----------------+  
|  1 | SIMPLE      | employees | ALL  | NULL          | NULL | NULL    | NULL | 10031 | Using filesort |  
+----+-------------+-----------+------+---------------+------+---------+------+-------+----------------+  
1 row in set (0.00 sec)  

But still the index is not used! What am I doing wrong here?
Does it have to do with the fact that the index is NON_UNIQUE?
BTW the last_name is VARCHAR(1000)

Update requested by @RolandoMySQLDBA

mysql> SELECT COUNT(DISTINCT last_name) DistinctCount FROM employees;  
+---------------+  
| DistinctCount |  
+---------------+  
|         10000 |  
+---------------+  
1 row in set (0.05 sec)  


mysql> SELECT COUNT(1) FROM (SELECT COUNT(1) Count500,last_name FROM employees GROUP BY last_name HAVING COUNT(1) > 500) A;  
+----------+  
| COUNT(1) |  
+----------+  
|        0 |  
+----------+  
1 row in set (0.15 sec)  

Best Answer

PROBLEM #1

Look at the query

select last_name from employees order by last_name;

I don't see a meaningful WHERE clause, and neither does the MySQL Query Optimizer. There is no incentive to use an index.

PROBLEM #2

Look at the query

select last_name from employees force index(idx_last_name) order by last_name; 

You gave it an index, but the Query Opitmizer took over. I have seen this behavior before (How do I force a JOIN to use a specific index in MySQL?)

Why should this happen?

Without a WHERE clause, Query Optimizer says the following to itself:

  • This is an InnoDB Table
  • It's an indexed column
  • The index has the row_id of the gen_clust_index (a.k.a. Clustered Index)
  • Why should I look at the index when
    • there is no WHERE clause?
    • I would always have to bounce back to the table?
  • Since all rows in an InnoDB table reside in the same 16K blocks as the gen_clust_index, I'll do a full table scan instead.

The Query Optimizer chose the path of least resistance.

You are going to be in for a little shock, but here it goes: Did you know that the Query Optimizer will handle MyISAM quite differently?

You are probably saying HUH ???? HOW ????

MyISAM stores the data in a .MYD file and all indexes in the .MYI file.

The same query will produce a different EXPLAIN plan because the index lives in a different file from the data. Why ? Here is why:

  • The data needed (last_name column) is already ordered in the .MYI
  • In the worst case, you will have a full index scan
  • You will only access the column last_name from the index
  • You do not need to sift through unwanted
  • You will not trigger temp file creation for sorting

How can be so sure of this? I have tested this working theory on how using a different storage will generate a different EXPLAIN plan (sometimes a better one): Must an index cover all selected columns for it to be used for ORDER BY?