MySQL – Force index degraded performance over time

indexindex-tuningMySQLmysql-5.7performance

Last year we migrated data from our old product environment into a new environment with our new product.

Since the new product was still fresh, we found that some of the new queries were very slow after the migration. and we fixed several of them by adding the 'FORCE INDEX(SOME_INDEX)' to the queries join clauses. this immediately solved our problem and sped up the queries dramatically.

Lately we see some performance degradation in the new product again, and the same queries we optimized last year using the 'FORCE INDEX(SOME_INDEX)' are not faster without the force index part of it (basically using the original query from last year is now faster)

Someone at the office threw to the air the idea that mysql keeps his own statistics of its indexes and as the mysql server keeps working it become more efficient in creating the flow of the query. (meaning that at the beginning it has less statistics and thats why hinting to force index helps it, and after a year of production usage its own statistics are enough).

Is this correct? is there any other plausible explanation for this?

Best Answer

Without any index hint, the Optimizer uses statistics to decide which index to use. And it updates the statistics in a reasonably (but not always) intelligent manner.

With a hint, such as FORCE INDEX, it ignores the statistics and uses the given index if at all possible.

Probably the FORCE was beneficial at first. But later the data in the table shifted due to new rows, changing distribution of values, or the phase of the moon. And now that index is no longer optimal.

Your experience is why I warn people against using hints, saying "it may help today, but hurt tomorrow."

If you would like to discuss the specific query, please provide SHOW CREATE TABLE and EXPLAIN SELECT... It may be that a different composite query, without the hint, will work even better than what you had or have.

More on the history... The Optimizer was heavily worked on in 5.6, 5.7, and 8.0. A big change was moving to a "cost-based" model. In theory, it produces better results. In practice, it rarely changes anything. And, as perhaps your case shows, it sometimes hurts. The presumption is that the cost-based model is usually better than the previous algorithm.

The statistics are updated when any of these happens:

  • The table grows by 10%. (This partially addresses your coworker's theory.)
  • You run ANALYZE TABLE. In rare cases, this is a "fix" for a bad index choice, but it may or may not stick.
  • Some flavors of ALTER.