Mysql – If I query records matching some value, why does InnoDB examine most of the records that had that value once, but have changed since then


We have an issue with MySQL InnoDB performance.
We insert records into a table with a status field value of zero, these records need to be processed by another task. When this another task starts processing them, it changes this status field to 1, then in case of an exception, the status goes to 2 or in case of successful processing, the status is set to 3.
The records remain in this table for three days then they are deleted. This way, the record count grows to around 1.7M and remains there.

If I run this:

describe SELECT * FROM temptable where status=0

It shows around 900k examined records, and the query itself returns 0 matching rows in about 5 seconds. There is an index on the status field. If I drop this index and recreate it, then the examined rows value goes down to 1 and starts to grow again, as new records are inserted.

If I query records with the STATUS value of 2 (Exception state), and none of these records existed ever, then the examined row count stays at 1 and the query returns instantly.

SELECT * FROM temptable where status=0
0 row(s) returned   5.538 sec / 0.000 sec
SELECT * FROM temptable where status=2
0 row(s) returned   0.000 sec / 0.000 sec

You can reproduce the same behaviour by creating a table like this:

CREATE TABLE temptable (
    someprimary int(11) NOT NULL AUTO_INCREMENT,
    status int(11) NOT NULL,
    PRIMARY KEY (someprimary),
    KEY IND_STATUS (status)

Fill it with around 100000 zero values, then update some zeros to ones:

update temptable set STATUS=1 where mod(someprimary,11)=0  

Set the remaining zeros to 3:

update temptable set STATUS=3 where STATUS=0

Now there are some STATUS=1 records, many STATUS=3 records and none with the value 0 or 2. You can now run the aforementioned selects and see for yourself if the query where we search for 2 returns instantly and the search for 0 requires some noticeable time to complete.
Is there any way to solve this issue and get the results instantly if there are only between 0-10 records matching the query and there is an index on this field?


SELECT * FROM temptable where status=0;
Empty set (1.56 sec)

describe SELECT * FROM temptable where status=0;  
| id | select_type | table     | type | possible_keys | key        | key_len | ref   | rows   | Extra       |  
|  1 | SIMPLE      | temptable | ref  | IND_STATUS    | IND_STATUS | 4       | const | 144651 | Using index |  
1 row in set (0.00 sec)

SELECT * FROM temptable where status=2;  
Empty set (0.00 sec)  
describe SELECT * FROM temptable where status=2;  
| id | select_type | table     | type | possible_keys | key        | key_len | ref   | rows | Extra       |  
|  1 | SIMPLE      | temptable | ref  | IND_STATUS    | IND_STATUS | 4       | const |    1 | Using index |  
1 row in set (0.00 sec)
(total record count is now 289303)  
SELECT count(1) db, STATUS FROM temptable group by STATUS;
| db     | STATUS |
|  25808 |      1 |
| 262864 |      3 |
2 rows in set (1.55 sec)

Thank you!

Best Answer

DESCRIBE, alias EXPLAIN, estimates the number of rows. In your case (frequent changing of status and deletion), the number are very likely to be very far off. Do not trust it.

EXPLAIN gets its information from "statistics" about the distribution of the data in the table. The statistics are updated at certain times. Version 5.7 does the update at quite different times than 5.5. (What version are you running?)

You have INDEX(status)? Perhaps you should not; it is adding to the problem and not helping the solution in a consistent way. How often are the status changes? The DELETEs? There may be workarounds.

The Optimizer will look at the statistics for status and incorrectly decided that status=0 was too common for using the INDEX, so it did a table scan instead. On the other hand, the stats said that status=2 was rare, so it did use the index and 'instantly' discovered zero rows.

Alas, running an EXPLAIN, even immediately after running the equivalent SELECT does not necessarily give you the explain plan for that select. (I suspect this explains the mysterious results you got. I have almost never seen such happen.)

Your test case is not quite correct -- You have only 2 columns, the secondary key and the PRIMARY KEY, so you got Using index; This is unlikely to happen in the 'real' table -- don't you have other fields?

Please try one more thing to help get to the root of this. Rerun the tests with

SELECT ... ;    -- (returning emptyset)

This will show whether it actually did an index (or table) scan, and was not just a caching problem, as hinted by @jkavalik .