Mysql – Wrong query execution plan for large tables

indexinnodbmariadbMySQL

Problem: We are experiencing a problem with large tables (+40M rows) having frequently updated their indexed columns with low selectivity. It seems that when the column is often updated/deleted after some time optimizer starts to choose the wrong execution plan. In my opinion the issue could be in outdated statistics (or not) provided to the optimizer (e.g. rec_per_key). Here is an example:

MariaDB [test]> select version();
+----------------------------------+
| version()                        |
+----------------------------------+
| 10.0.33-MariaDB-0ubuntu0.16.04.1 |
+----------------------------------+ 


MariaDB [test]> show create table t1 \G;
    *************************** 1. row ***************************
           Table: t1
    Create Table: CREATE TABLE `t1` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `status` enum('a','b','c','d','e') DEFAULT 'a',
      PRIMARY KEY (`id`),
      KEY `status` (`status`)
    ) ENGINE=InnoDB AUTO_INCREMENT=12001 DEFAULT CHARSET=utf8mb4

MariaDB [test]> show index from t1 \G;
*************************** 1. row ***************************
        Table: t1
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 11980
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
*************************** 2. row ***************************
        Table: t1
   Non_unique: 1
     Key_name: status
 Seq_in_index: 1
  Column_name: status
    Collation: A
  Cardinality: 15
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment: 
Index_comment:

MariaDB [test]> select index_name,last_update,stat_name,stat_value,sample_size,stat_description from mysql.innodb_index_stats where database_name='test' AND table_name='t1';
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| PRIMARY    | 2018-01-30 18:31:07 | n_diff_pfx01 |      11555 |          20 | id                                |
| PRIMARY    | 2018-01-30 18:31:07 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| PRIMARY    | 2018-01-30 18:31:07 | size         |         21 |        NULL | Number of pages in the index      |
| status     | 2018-01-30 18:31:07 | n_diff_pfx01 |          4 |          10 | status                            |
| status     | 2018-01-30 18:31:07 | n_diff_pfx02 |      11575 |          10 | status,id                         |
| status     | 2018-01-30 18:31:07 | n_leaf_pages |         10 |        NULL | Number of leaf pages in the index |
| status     | 2018-01-30 18:31:07 | size         |         11 |        NULL | Number of pages in the index      |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+

MariaDB [test]> select count(distinct(status)) from t1;
+-------------------------+
| count(distinct(status)) |
+-------------------------+
|                       4 |
+-------------------------+

MariaDB [test]> explain select * from t1 where status='a';
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | status        | status | 2       | const | 5038 | Using where; Using index |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 5039  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     5039 |
+----------+

MariaDB [test]> update t1 set status='b' where status='a' limit 1000;
Query OK, 1000 rows affected (0.22 sec)
Rows matched: 1000  Changed: 1000  Warnings: 0

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     4039 |
+----------+
1 row in set (0.01 sec)

MariaDB [test]> explain select * from t1 where status='a';
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | status        | status | 2       | const | 5038 | Using where; Using index |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 4039  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.01 sec)

MariaDB [test]> select index_name,last_update,stat_name,stat_value,sample_size,stat_description from mysql.innodb_index_stats where database_name='test' AND table_name='t1';
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| PRIMARY    | 2018-01-30 18:52:25 | n_diff_pfx01 |      11980 |          20 | id                                |
| PRIMARY    | 2018-01-30 18:52:25 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| PRIMARY    | 2018-01-30 18:52:25 | size         |         21 |        NULL | Number of pages in the index      |
| status     | 2018-01-30 18:52:25 | n_diff_pfx01 |          4 |          12 | status                            |
| status     | 2018-01-30 18:52:25 | n_diff_pfx02 |      12058 |          12 | status,id                         |
| status     | 2018-01-30 18:52:25 | n_leaf_pages |         12 |        NULL | Number of leaf pages in the index |
| status     | 2018-01-30 18:52:25 | size         |         13 |        NULL | Number of pages in the index      |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)

MariaDB [test]> show index from t1 \G;
*************************** 1. row ***************************
        Table: t1
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 11980
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
*************************** 2. row ***************************
        Table: t1
   Non_unique: 1
     Key_name: status
 Seq_in_index: 1
  Column_name: status
    Collation: A
  Cardinality: 15
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment: 
Index_comment:



MariaDB [test]> show variables like 'innodb_stat%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_modified_counter        | 0           |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_sample_pages            | 8           |
| innodb_stats_traditional             | ON          |
| innodb_stats_transient_sample_pages  | 8           |
| innodb_status_output                 | OFF         |
| innodb_status_output_locks           | OFF         |
+--------------------------------------+-------------+

Analyze table doesn't help. Not even changing innodb_stats_persistent_sample_pages to higher values. The only solutions so far are rebuilding the table (alter, mysqldump/recover) or use/force index.

In our real world scenario optimizer (at least shown by explained) assumes it has to examine 1 600 000 rows, when in reality the handler statistic shows that only 400 rows have been checked.

Any help would be really appreciated.

Edited – elaborating on Rick's post:

1. Regarding "Cardinality": Does the optimizer assume that all values occur equally frequently ? Do you mean that if I have a table with 20 rows and index with cardinality of 4, it expects to find the same value in every fourth row ?

MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
|    12000 |
+----------+

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     3039 |
+----------+

MariaDB [test]> explain select status from t1 where status='a';
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | status        | status | 2       | const | 3038 | Using where; Using index |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+

Explain shows rows examined 3038, so it knows pretty well how many rows it has to check. To me it doesn't look like it assumes that all values occur equally frequently. Maybe I've just misunderstood your explanation 🙂

2. Regarding Select using the index: the point was to show how the rows examined differ when the indexed column is frequently updated.

3. Regarding Explain output: I agree that it's often inaccurate.

4. Regarding poor index choice: I agree, but unfortunately we can't change that right now.

5. Regarding Select count(): the point was only to show how many rows matched the where clause.

Another and perhaps better example of how we are affected by this issue.

Table 1

MariaDB [test]> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                               |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `status` enum('a','b','c','d','e') DEFAULT 'a',
  PRIMARY KEY (`id`),
  KEY `status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=12001 DEFAULT CHARSET=utf8mb4 |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Table 2

MariaDB [test]> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                               |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `status` enum('a','b','c','d','e') DEFAULT 'a',
  PRIMARY KEY (`id`),
  KEY `status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=12001 DEFAULT CHARSET=utf8mb4 |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Indexes for table t1

MariaDB [test]> select * from mysql.innodb_index_stats where database_name='test' and table_name='t1';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test          | t1         | PRIMARY    | 2018-01-31 12:28:17 | n_diff_pfx01 |      11980 |          20 | id                                |
| test          | t1         | PRIMARY    | 2018-01-31 12:28:17 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| test          | t1         | PRIMARY    | 2018-01-31 12:28:17 | size         |         21 |        NULL | Number of pages in the index      |
| test          | t1         | status     | 2018-01-31 12:28:17 | n_diff_pfx01 |          4 |           9 | status                            |
| test          | t1         | status     | 2018-01-31 12:28:17 | n_diff_pfx02 |      12000 |           9 | status,id                         |
| test          | t1         | status     | 2018-01-31 12:28:17 | n_leaf_pages |          9 |        NULL | Number of leaf pages in the index |
| test          | t1         | status     | 2018-01-31 12:28:17 | size         |         10 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

***Indexes for table t2***

MariaDB [test]> select * from mysql.innodb_index_stats where database_name='test' and table_name='t2';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test          | t2         | PRIMARY    | 2018-01-31 12:32:54 | n_diff_pfx01 |       4000 |          14 | id                                |
| test          | t2         | PRIMARY    | 2018-01-31 12:32:54 | n_leaf_pages |         14 |        NULL | Number of leaf pages in the index |
| test          | t2         | PRIMARY    | 2018-01-31 12:32:54 | size         |         15 |        NULL | Number of pages in the index      |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | n_diff_pfx01 |        984 |           5 | t1_id                             |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | n_diff_pfx02 |       4000 |           5 | t1_id,id                          |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | n_leaf_pages |          5 |        NULL | Number of leaf pages in the index |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | size         |          6 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

Number of rows in t1

MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
|    12000 |
+----------+

Number of rows in t2

MariaDB [test]> select count(*) from t2;
+----------+
| count(*) |
+----------+
|     4000 |
+----------+

Query in question

MariaDB [test]> explain select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: index
possible_keys: fk_t1_id
          key: fk_t1_id
      key_len: 5
          ref: NULL
         rows: 4000
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: eq_ref
possible_keys: PRIMARY,status
          key: PRIMARY
      key_len: 4
          ref: test.t2.t1_id
         rows: 1
        Extra: Using where

Now, we update 1000 rows with b value. After that it should be using the status index because it's less expensive to examine 3039 rows.

MariaDB [test]> update t1 set status='b' where status='a' limit 1000;
Query OK, 1000 rows affected (0.44 sec)
Rows matched: 1000  Changed: 1000  Warnings: 0

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     3039 |
+----------+

MariaDB [test]> explain select sql_no_cache status from t1 where status='a' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: status
          key: status
      key_len: 2
          ref: const
         rows: 4038
        Extra: Using where; Using index

MariaDB [test]> explain select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: index
possible_keys: fk_t1_id
          key: fk_t1_id
      key_len: 5
          ref: NULL
         rows: 4000
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: eq_ref
possible_keys: PRIMARY,status
          key: PRIMARY
      key_len: 4
          ref: test.t2.t1_id
         rows: 1
        Extra: Using where

As you can see the query execution plan seems to be the same, it still assumes that 4000 rows have to be examined.

MariaDB [test]> select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a';
Empty set (0.00 sec)

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 984   |
| Handler_read_last          | 0     |
| Handler_read_next          | 4000  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

Now let's alter the table t1.

MariaDB [test]> alter table t1 engine=innodb;
Query OK, 0 rows affected (1.60 sec)
Records: 0  Duplicates: 0  Warnings: 0

The execution plan has changed.

MariaDB [test]> explain select sql_no_cache status from t1 where status='a' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: status
          key: status
      key_len: 2
          ref: const
         rows: 3038
        Extra: Using where; Using index

MariaDB [test]> explain select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: PRIMARY,status
          key: status
      key_len: 2
          ref: const
         rows: 3038
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: fk_t1_id
          key: fk_t1_id
      key_len: 5
          ref: test.t1.id
         rows: 2
        Extra: Using index

MariaDB [test]> select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
Empty set (0.01 sec)

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 3040  |
| Handler_read_last          | 0     |
| Handler_read_next          | 3039  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

Best Answer

Current (that is, before Histograms are implemented), the MySQL and MariaDB Optimizers have only a single metric for "cardinality" -- that is, they assume all value occur equally frequently. To top it off, ANALYZE and the stats, only approximate the cardinality. (Histograms are coming.)

The two SELECTs you presented ran the "best" way -- using the index.

EXPLAIN "Rows" is often inaccurate.

"Status" (and other flag-like) columns are rarely worth indexing by itself. It may be more useful to have status as the first column in a multi-column ("composite") index.

SELECT COUNT(*) is not a good example for making an argument -- unless that really is the query you use.

(Responding to the Edited Question)...

  • Rows in EXPLAIN is an estimate of the number of rows that need to be read. It is rarely the exact number of "rows examined". "Rows examined" can be found in the slowlog.

  • To estimate cardinality, InnoDB "dives" into the BTree to make some (configurable) number of probes. Depending on how the data is distributed, these probes may or may not reflect the actual cardinality.

  • The existence of INDEX(status) led you to jump to some conclusions (I think). This also leads to exactly 3040 showing up in several outputs. (Or within 1 -- find the first row, then "read_next 3039 times".)

  • 20 rows with cardinality of 4 means that about 5 (not 4) copies of each value exist.

  • Cardinality implies that equal number of rows exists for each possible value, not that every nth will have that value.

  • JOINs -- The only thing the Optimizer can do is assume 'even' distribution of values. So, things can get really muddy.