Mysql – Why MySQL pick a different index for the same query structure but different values

indexinnodbmysql-5.7query-performance

Right now i have some issues understanding how MySQL picks the index to use on a query. I am executing a "COUNT QUERY", but everytime i use a different foreign key for the query, MySQL picks a different index to use.

Let me elaborate and explain my situation. On our DB table we have around 40 millions rows. I am using two "Count querys" with range filters to get the number of rows. The two clients i use to test this behavior have this values:

  • Cliente_id 1 has around 8 millions rows on reports table.
  • Cliente_id 2 has around 1.5 millions rows on reports table.

Now i execute the following querys:

Count query of client with id 1 (finish in 5.6 seg)

Select count(*) From reports where cliente_id = 1 and (created_at  >= '2020-01-01T00:00:00' and created_at <= '2020-01-31T00:00:00')

Count query of client with id 2 (finish in 58 seg)

Select count(*) From reports where cliente_id = 2 and (created_at  >= '2020-01-01T00:00:00' and created_at <= '2020-01-31T00:00:00')

As you can see the second query of the client with less records, takes too long to execute. This seems odd, so when i try to use "EXPLAIN" to see which index is been use i got this:

EXPLAIN OF query from cliente_id 1 (Use created_at index.)

1   SIMPLE  reports range   index_reports_clienteid,index_reports_created_at    index_reports_created_at    63      5534206 Using index condition; Using where

EXPLAIN OF query from cliente_id 2 (Use cliente_id index.)

1   SIMPLE  reports range   index_reports_clienteid,index_reports_created_at    index_reports_clienteid 63      5534206 Using index condition; Using where

The second Count query is using a different INDEX. I been reading a lot of Mysql doc, but with not luck. After a while i found a way to force the use of a specific index on a Query. I try this:

Selec Count with Use Index

SELECT COUNT(*) FROM reports USE INDEX(index_reports_created_at)  WHERE (company_id = '2' AND (created_at >= '2020-01-01T00:00:00' AND created_at <= '2020-01-31T0:0:00' ))

When i force the use of index on the query it only took around 6 seconds to finish. In theory the most optimal index to use is created_at, but on this situation MySQL picks "cliente_id index".

What is the criteria here to pick cliente_id over created_at? I just don't understand this behavior. I will really appreciated if someone can help me with this situation.

MYSQL VERSION is 5.7.32 and i am using innoDB engine

NOTE. I can hardcode the use index on my querys but i dont think is the right think to do. So i will really like to avoid use "USE INDEX" on my querys.

Note2. Sorry for my poor english. I try my best to explain.

Thanks for the help.

Best Answer

Per your testing based on my comments, it does seem like it's related to the cardinality of the data. I.e. MySQL's Engine thinks a larger date range predicate in your WHERE clause will result in enough data to be returned that the cliente_id is a better index to use when searching for that data. (It may even be doing a Full Index Scan instead of seeking through the index, but this I can't say for sure without seeing more analysis of the execution plan.)

Normally there's a cause behind cardinality issues (e.g. non-SARGable predicates, bad statistics, etc), but unfortunately I'm not an expert on MySQL and from looking at your query it appears to be pretty benevolent. Without any additional information my only guess is it's not any particular root issue, rather it's just a shortcoming of the MySQL Engine in this particular case (which can happen occasionally with larger datasets).

Normally my suggestion would be to cautiously use an index hint (as you already have tried) and to keep an eye on future performance of that query.