A query is painstakingly slow in both production environment and (fortunately reproducable) on my local dev box. Production is Linux and my dev is currently Windows 10 with MySQL 5.7.11:
SHOW VARIABLES LIKE "%version%";
'innodb_version', '5.7.11'
'protocol_version', '10'
'slave_type_conversions', ''
'tls_version', 'TLSv1,TLSv1.1'
'version', '5.7.11-log'
'version_comment', 'MySQL Community Server (GPL)'
'version_compile_machine', 'x86_64'
'version_compile_os', 'Win64'
The tables are InnoDB. The query involves just one table and is generated by Django
ORM. Here is the table:
select column_name, column_type, is_nullable
from information_schema.columns
where table_name='main_contactinfo'
and table_schema='live040616'
id int(11) NO
first_name varchar(100) YES
last_name varchar(100) YES
club_name varchar(100) YES
email varchar(254) YES
team_name_id int(11) YES
address varchar(100) YES
city varchar(100) YES
state varchar(100) YES
zip varchar(20) YES
contact_type varchar(100) YES
player_id int(11) YES
last_modified datetime NO
object_id varchar(300) YES
sync_inactive tinyint(1) NO
customer_account_id int(11) YES
is_asgr tinyint(1) NO
relation varchar(150) YES
note longtext YES
twitter varchar(100) YES
facebook varchar(100) YES
other longtext YES
aau longtext YES
aau_coachnum varchar(300) YES
hs_coachnum varchar(300) YES
parent_status varchar(300) YES
instagram varchar(100) YES
The table has about 21702503 record in it. The query:
SELECT main_contactinfo.id, main_contactinfo.first_name, main_contactinfo.last_name,
main_contactinfo.club_name, main_contactinfo.email, main_contactinfo.team_name_id,
main_contactinfo.address, main_contactinfo.city, main_contactinfo.state,
main_contactinfo.zip, main_contactinfo.contact_type, main_contactinfo.relation,
main_contactinfo.player_id, main_contactinfo.last_modified,
main_contactinfo.object_id, main_contactinfo.sync_inactive,
main_contactinfo.customer_account_id, main_contactinfo.is_asgr,
main_contactinfo.note, main_contactinfo.twitter, main_contactinfo.facebook,
main_contactinfo.instagram, main_contactinfo.other, main_contactinfo.aau,
main_contactinfo.aau_coachnum, main_contactinfo.hs_coachnum,
main_contactinfo.parent_status
FROM main_contactinfo
WHERE ( main_contactinfo.customer_account_id = 12345
AND ( (main_contactinfo.sync_inactive = 1
AND main_contactinfo.last_modified >= '2016-04-17 00:16:55')
OR (main_contactinfo.last_modified > '2015-09-28 17:40:21'
AND main_contactinfo.sync_inactive = 0)
OR (main_contactinfo.sync_inactive = 0
AND main_contactinfo.last_modified = '2015-09-28 17:40:21'
AND main_contactinfo.id > 19968645)
)
)
ORDER BY main_contactinfo.last_modified ASC,
main_contactinfo.id ASC
LIMIT 83;
The WHERE clause operates on the last_modified
, because it's a part of a system where only changes should be returned since a certain time stamp (datetime). The query on my machine takes ~190 seconds, 80 records returned. I see the HDD I/O light lit up during that time, and I'm running two Crucial M550 in RAID0. That's nowhere near today's nVME SSDs, but it's still ~1GBps throughput.
- I was suspicious about the
sync_inactive
(which is a boolean field on the other end of the ORM), that's not covered by an index, creating an index onsync_inactive
didn't help. - I gradually created composite indexes covering more and more columns, but even a
CREATE UNIQUE INDEX composite_index ON main_contactinfo(id, customer_account_id, last_modified, sync_inactive);
doesn't help much. - Can it be a problem that I use a range for datetime?
- Also notice that I have 3
longtext
columns involved in the query (I have no clue why theORM
includes those, I wish it'd do some lazy loading. But I removed those 3 fields from the query and those didn't help either - I wonder if there can be some unwanted conversion character set or collation conversion? What I see is that the character set of the columns are latin1, and collation is latin1_swwedish_ci, while the collation and character set enforced by the ORM side is utf8.
However the main problem seems to be by the OPTIMIZER_TRACE
is that even with a huge index which covers a lot of columns the engine chooses another index, which only covers customer_account_id
. The reason why my covering composite index was not chosen is "unknown". How can I improve that query?
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "composite_index",
"chosen": false,
"cause": "unknown"
},
{
"index": "main_contactinfo_f4b91458",
"ranges": [
"12345 <= customer_account_id <= 12345"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 42294,
"cost": 50754,
"chosen": true
},
{
"index": "main_contactinfo_last_modified_4769d8a97da686d4_uniq",
"ranges": [
"0x9997391a15 <= last_modified <= 0x9997391a15 AND 19968645 < id",
"0x9997391a15 < last_modified"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 10272032,
"cost": 1.23e7,
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
More complete optimize trace (unfortunately truncated though):
https://gist.github.com/MrCsabaToth/62ed290b6c8ddd4f40a172a87a8c0f84
Best Answer
For this
WHERE
clause, I would try an index on(customer_account_id, sync_inactive, last_modified, id)
:The
ORDER BY
with e smallLIMIT
complicates things though, so a different order of the columns in the index might be better:(customer_account_id, last_modified, id, sync_inactive)
.