MySQL Query Performance – Optimizer Not Using Indexes

indexMySQLmysql-5.6mysql-5.7performancequery-performance

In our production environment (mySQL 5.6.40 on Amazon RDS), for certain queries the query optimizer ignores table indexes and takes several minutes to complete, whereas in our non-RDS test environment (mySQL 5.7), the same queries take a fraction of a second – due to using the indexes.

Rather than immediately upgrading from mySQL 5.6.40 to 5.7 on RDS (assuming the optimizer simply works better!), I wondered if there was something we're missing as the queries appear straight forward… I'm a bit surprised the optimizer doesn't pick the indexes.

Example query:

select id, username, agent, app_instance_id, end_time, event_time, 
last_modified 
from event where username='joebloggs' and app_instance_id<>'ad69469a- 
d73f-4b33-a1c5-bde03c765d89' and last_modified>'2018-08-20 20:14:51.394' 
order by last_modified asc limit 20

Execution time:

  • mySQL 5.6.40: 6m+
  • mySQL 5.7: 0.02s

EXPLAIN:

On mySQL 5.6.40 EXPLAIN shows that the optimizer doesn't use any of the indexes (5th column 'possible keys', 9th column 'extra')

'1','SIMPLE','event','range','last_modified,username,appinstanceid,user_app_last','last_modified','6',NULL,'1128956','Using where'

Whereas on mySQL 5.7 EXPLAIN shows the optimizer picks an index.

'1','SIMPLE','event',NULL,'range','username,last_modified,appinstanceid,user_app_last','last_modified','6',NULL,'1030540','0.54','Using index condition; Using where'

Long story short, our RDS instance used to run mySQL 5.5 and all worked fine… but with a forced upgrade to 5.6.40 due to end of life on 5.5 this problem has bubbled to the surface.

I've also run ANALYZE TABLE to ensure stats etc updated for optimizer. Any insight would be greatly appreciated. Thanks!

UPDATE:

  1. Our client app is using JPA, which can be configured to use index hints. Is it simply that the query optimizer is better on 5.7 and that we should upgrade, or should we be providing hints after all.. and not relying on the query optimizer?

  2. After running further tests on a new TEST instance (mySQL 5.7.23 upgraded from 5.6.40) on RDS it appears to be hit and miss on the same query as to whether the optimizer chooses to use indexes.

Best Answer

In any system, this will probably be beneficial, probably optimal:

INDEX(username, last_modified) -- in this order

Please provide SHOW CREATE TABLE so we can see what indexes, datatypes, engine, etc, you have.

The index I recommend

  • Is a 'composite' index.
  • Is not the same as two single-column indexes.
  • Deliberately starts with the column tested via =.
  • Ignores the <> as not being useful.
  • Includes one 'range' (on last_modified).
  • It should, but may not consume the ORDER BY, thereby allowing it to efficiently consume the LIMIT. Without reaching all the way to the LIMIT, it is destined to scan the entire table, thereby taking a 'long' time.

A side note: UUIDs (such as what app_instance_id seems to be) are notoriously inefficient when indexed in large tables. For this query, I am hoping that very few of the rows have that particular value.

To further investigate the "why", answer these questions:

  • What percentage of the table contains username='joebloggs'. and
  • What percentage of the table contains last_modified>'2018-08-20 20:14:51.394'

If either of these is more than say, 20%, then the Optimizer wisely chose to ignore the corresponding index. It could be that the statistics on one was better than on the other. Let's see SHOW INDEXES FROM event.

Cookbook for devising good indexes.

Try to avoid index hints -- they may help today, but hurt tomorrow (with different constants).