Mysql – Complex key performance problem (MySQL, InnoDB)

indexindex-tuningMySQLperformancequery-performance

A query is painstakingly slow in both production environment and (fortunately reproducable) on my local dev box. Production is Linux and my dev is currently Windows 10 with MySQL 5.7.11:

SHOW VARIABLES LIKE "%version%";
'innodb_version', '5.7.11'
'protocol_version', '10'
'slave_type_conversions', ''
'tls_version', 'TLSv1,TLSv1.1'
'version', '5.7.11-log'
'version_comment', 'MySQL Community Server (GPL)'
'version_compile_machine', 'x86_64'
'version_compile_os', 'Win64'

The tables are InnoDB. The query involves just one table and is generated by Django ORM. Here is the table:

select  column_name, column_type, is_nullable
    from  information_schema.columns
    where  table_name='main_contactinfo'
      and  table_schema='live040616'

id  int(11) NO
first_name  varchar(100)    YES
last_name   varchar(100)    YES
club_name   varchar(100)    YES
email   varchar(254)    YES
team_name_id    int(11) YES
address varchar(100)    YES
city    varchar(100)    YES
state   varchar(100)    YES
zip varchar(20) YES
contact_type    varchar(100)    YES
player_id   int(11) YES
last_modified   datetime    NO
object_id   varchar(300)    YES
sync_inactive   tinyint(1)  NO
customer_account_id int(11) YES
is_asgr tinyint(1)  NO
relation    varchar(150)    YES
note    longtext    YES
twitter varchar(100)    YES
facebook    varchar(100)    YES
other   longtext    YES
aau longtext    YES
aau_coachnum    varchar(300)    YES
hs_coachnum varchar(300)    YES
parent_status   varchar(300)    YES
instagram   varchar(100)    YES

The table has about 21702503 record in it. The query:

SELECT  main_contactinfo.id, main_contactinfo.first_name, main_contactinfo.last_name,
        main_contactinfo.club_name, main_contactinfo.email, main_contactinfo.team_name_id,
        main_contactinfo.address, main_contactinfo.city, main_contactinfo.state,
        main_contactinfo.zip, main_contactinfo.contact_type, main_contactinfo.relation,
        main_contactinfo.player_id, main_contactinfo.last_modified,
        main_contactinfo.object_id, main_contactinfo.sync_inactive,
        main_contactinfo.customer_account_id, main_contactinfo.is_asgr,
        main_contactinfo.note, main_contactinfo.twitter, main_contactinfo.facebook,
        main_contactinfo.instagram, main_contactinfo.other, main_contactinfo.aau,
        main_contactinfo.aau_coachnum, main_contactinfo.hs_coachnum,
        main_contactinfo.parent_status
FROM main_contactinfo 
WHERE ( main_contactinfo.customer_account_id = 12345 
    AND ( (main_contactinfo.sync_inactive = 1 
           AND main_contactinfo.last_modified >= '2016-04-17 00:16:55') 
       OR (main_contactinfo.last_modified > '2015-09-28 17:40:21' 
           AND main_contactinfo.sync_inactive = 0) 
       OR (main_contactinfo.sync_inactive = 0
           AND main_contactinfo.last_modified = '2015-09-28 17:40:21'
           AND main_contactinfo.id > 19968645)
        )
      ) 
ORDER BY main_contactinfo.last_modified ASC, 
         main_contactinfo.id ASC 
LIMIT 83;

The WHERE clause operates on the last_modified, because it's a part of a system where only changes should be returned since a certain time stamp (datetime). The query on my machine takes ~190 seconds, 80 records returned. I see the HDD I/O light lit up during that time, and I'm running two Crucial M550 in RAID0. That's nowhere near today's nVME SSDs, but it's still ~1GBps throughput.

  1. I was suspicious about the sync_inactive (which is a boolean field on the other end of the ORM), that's not covered by an index, creating an index on sync_inactive didn't help.
  2. I gradually created composite indexes covering more and more columns, but even a CREATE UNIQUE INDEX composite_index ON main_contactinfo(id, customer_account_id, last_modified, sync_inactive); doesn't help much.
  3. Can it be a problem that I use a range for datetime?
  4. Also notice that I have 3 longtext columns involved in the query (I have no clue why the ORM includes those, I wish it'd do some lazy loading. But I removed those 3 fields from the query and those didn't help either
  5. I wonder if there can be some unwanted conversion character set or collation conversion? What I see is that the character set of the columns are latin1, and collation is latin1_swwedish_ci, while the collation and character set enforced by the ORM side is utf8.

However the main problem seems to be by the OPTIMIZER_TRACE is that even with a huge index which covers a lot of columns the engine chooses another index, which only covers customer_account_id. The reason why my covering composite index was not chosen is "unknown". How can I improve that query?

              "analyzing_range_alternatives": {
                "range_scan_alternatives": [
                  {
                    "index": "composite_index",
                    "chosen": false,
                    "cause": "unknown"
                  },
                  {
                    "index": "main_contactinfo_f4b91458",
                    "ranges": [
                      "12345 <= customer_account_id <= 12345"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": true,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 42294,
                    "cost": 50754,
                    "chosen": true
                  },
                  {
                    "index": "main_contactinfo_last_modified_4769d8a97da686d4_uniq",
                    "ranges": [
                      "0x9997391a15 <= last_modified <= 0x9997391a15 AND 19968645 < id",
                      "0x9997391a15 < last_modified"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 10272032,
                    "cost": 1.23e7,
                    "chosen": false,
                    "cause": "cost"
                  }
                ],
                "analyzing_roworder_intersect": {
                  "usable": false,
                  "cause": "too_few_roworder_scans"
                }
              },

More complete optimize trace (unfortunately truncated though):
https://gist.github.com/MrCsabaToth/62ed290b6c8ddd4f40a172a87a8c0f84

Best Answer

For this WHERE clause, I would try an index on (customer_account_id, sync_inactive, last_modified, id):

ALTER TABLE main_contactinfo
  ADD INDEX customer_active_last_modified
  ( customer_account_id, 
    sync_inactive, 
    last_modified, 
    id
  ) ;

The ORDER BY with e small LIMIT complicates things though, so a different order of the columns in the index might be better: (customer_account_id, last_modified, id, sync_inactive).