Mysql – From where does the MySQL Query Optimizer read index statistics

index-statisticsMySQL

I'm trying to determine from where the MySQL optimizer obtains the list of indexes that are available for a table when it estimates the cost of (prepares) a query from.

Best Answer

The direct answer for this would be

information_schema.statistics

mysql> desc information_schema.statistics;
+---------------+---------------+------+-----+---------+-------+
| Field         | Type          | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| TABLE_CATALOG | varchar(512)  | NO   |     |         |       |
| TABLE_SCHEMA  | varchar(64)   | NO   |     |         |       |
| TABLE_NAME    | varchar(64)   | NO   |     |         |       |
| NON_UNIQUE    | bigint(1)     | NO   |     | 0       |       |
| INDEX_SCHEMA  | varchar(64)   | NO   |     |         |       |
| INDEX_NAME    | varchar(64)   | NO   |     |         |       |
| SEQ_IN_INDEX  | bigint(2)     | NO   |     | 0       |       |
| COLUMN_NAME   | varchar(64)   | NO   |     |         |       |
| COLLATION     | varchar(1)    | YES  |     | NULL    |       |
| CARDINALITY   | bigint(21)    | YES  |     | NULL    |       |
| SUB_PART      | bigint(3)     | YES  |     | NULL    |       |
| PACKED        | varchar(10)   | YES  |     | NULL    |       |
| NULLABLE      | varchar(3)    | NO   |     |         |       |
| INDEX_TYPE    | varchar(16)   | NO   |     |         |       |
| COMMENT       | varchar(16)   | YES  |     | NULL    |       |
| INDEX_COMMENT | varchar(1024) | NO   |     |         |       |
+---------------+---------------+------+-----+---------+-------+
16 rows in set (0.01 sec)

You could SELECT from that table with

SELECT * FROM information_schema.statistics
WHERE table_schema='mydb' AND table_name='mytable';

or see the statistics by doing

SHOW INDEXES FROM mydb.mytable;

Please keep in mind that this table is not always accurate in a write-heavy environment. Periodically you will have to run ANALYZE TABLE against all MyISAM tables that are updated frequently. Otherwise, the MySQL Query Optimizer, which relies on information_schema.statistics, can sometimes make bad choices when developing EXPLAIN plans for queries. Index statistics must be as up-to-date as possible.

ANALYZE TABLE has ABSOLUTELY NO EFFECT against InnoDB tables. All index statistics for InnoDB are computed on demand by means of dives into the BTREE pages. Therefore, when you run SHOW INDEXES FROM against an InnoDB table, the cardinalities displayed are always approximations.

UPDATE 2011-06-21 12:17 EDT

For clarification of ANALYZE TABLE, let me rephrase. Running ANALYZE TABLE on InnoDB tables is completely useless. Even if you ran ANALYZE TABLE on an InnoDB table, the InnoDB storage engine performs dives into the index for cardinality approximations over and over again, thus trashing the statistics you just compiled. In fact, Percona performed some tests on ANALYZE TABLE and came to that conclusion as well.

Related Solutions

Mysql – How does the MySQL Query Optimizer react to a SELECT COUNT sentence

The two queries have a very big difference:

----- query 1
SELECT COUNT(*) 
FROM customers 
WHERE ID > 10000 
  AND country = 'US' ;

----- query 2
SELECT * 
FROM customers 
WHERE ID > 10000 
  AND country = 'US' ;

While the second query returns all rows that match the WHERE conditions, the first one has an aggregate function (COUNT()) in the SELECT list, so it does an aggregation, a collapsing of rows that match the conditions into one row and returns only one number, the number of rows that match the conditions.

So, for the first query, there is no sensible reason to have an ORDER BY. The result is one row only. Even more, it should produce an error as the rows (that have been collapsed into one) may have different values in the country and created_at columns. So, which one should be used for the ordering (say in a case where you had a GROUP BY and the result set was more than one rows)?

You can test at SQL-Fiddle that SQL-Server, when you add ORDER BY country, created_at, it produces the error:

Column "customers.country" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

An error is produced in Postgres, too.

But even in MySQL that may allow such non-standard syntax, to add ORDER BY in the first query, the optimizer is smart enough to not take that into account for the execution plan. There is nothing to order. One row will be returned anyway. You can check that by viewing the execution plans with EXPLAIN. Simple test at SQL-Fiddle: Mysql-test

Oracle (version 11g2) seems to allow such nonsense too. You can see the execution plan here: Oracle-test. Not sure how the plan should be interpreted but it seems that Oracle at least knows that it's one row only so the "sorting" operation is not costly.

MySQL query optimizer ignoring smaller scan on TIMESTAMP column, cardinality

For these queries and if your WHERE is as you have shown and you also have ORDER BY rf_timestamp you can use this index, which should be far better than a single index on si_id or a single index on rf_timestamp:

ALTER TABLE rf
  ADD INDEX si_id__rf_timestamp__IX         -- choose a name for the index
    (si_id, rf_timestamp) ;

With a table of this size, adding this index will take some time and the table will be locked in the mean time, so it would be better if you did this when there is not much traffic and work by others in the database.

Best Answer

Related Solutions

Mysql – How does the MySQL Query Optimizer react to a SELECT COUNT sentence

MySQL query optimizer ignoring smaller scan on TIMESTAMP column, cardinality

Related Question