MySQL – Query Taking a Long Time

djangoindexMySQL

I have a performance issue with a MySQL query that takes a long time. I'm trying to figure out why.

I'm using Django, a web framework that comes with ORM (Object relational Mappers) and basically generates my SQL statements. Of course I can also write custom SQL if needed.

The query that works but takes a long time is:

SELECT DISTINCT `trackdata_lead`.`id`,
            `trackdata_lead`.`company_name`,
            `trackdata_lead`.`status`,
            `trackdata_lead`.`company_uid`,
            `trackdata_lead`.`company_zipcode`,
            `trackdata_lead`.`company_city`,
            `trackdata_lead`.`company_canton`
FROM   `trackdata_lead`
   INNER JOIN `trackdata_trackrecord`
           ON ( `trackdata_lead`.`id` = `trackdata_trackrecord`.`lead_id` )
WHERE  ( `trackdata_lead`.`site_id` = 41
     AND `trackdata_trackrecord`.`timestamp` >= '2016-12-20 23:00:00'
     AND `trackdata_trackrecord`.`timestamp` <= '2016-12-25 22:59:59' );

This query takes about 19 seconds and returns 531 rows:

531 rows in set (19.29 sec)

I have ran EXPLAIN to check what MySQL is doing:

mysql> EXPLAIN SELECT DISTINCT `trackdata_lead`.`id`, `trackdata_lead`.`company_name`,
`trackdata_lead`.`status`, `trackdata_lead`.`company_uid`, `trackdata_lead`.`company_zipcode`,
`trackdata_lead`.`company_city`, `trackdata_lead`.`company_canton` FROM `trackdata_lead` INNER JOIN
`trackdata_trackrecord` ON (`trackdata_lead`.`id` = `trackdata_trackrecord`.`lead_id`) WHERE
(`trackdata_lead`.`site_id` = 41 AND `trackdata_trackrecord`.`timestamp` >= '2016-12-20 23:00:00' AND
`trackdata_trackrecord`.`timestamp` <= '2016-12-25 22:59:59') \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: trackdata_lead
         type: ref
possible_keys: PRIMARY,trackdata_lead_site_id_d3b7ac3d_fk_customer_site_id
          key: trackdata_lead_site_id_d3b7ac3d_fk_customer_site_id
      key_len: 4
          ref: const
         rows: 4269
        Extra: Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: trackdata_trackrecord
         type: ref
possible_keys: trackdata_trackrecord_d7e6d55b,trackdata_trackrecord_8f635513
          key: trackdata_trackrecord_8f635513
      key_len: 5
          ref: permaleads.trackdata_lead.id
         rows: 64
        Extra: Using where; Distinct
2 rows in set (0.00 sec)

I think it takes a long time since it uses a temporary table. I have the following indexes:

mysql> SHOW INDEX FROM trackdata_trackrecord;
+-----------------------+------------+-------------------------------------------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                 | Non_unique | Key_name                                              | Seq_in_index | Column_name        | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-------------------------------------------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| trackdata_trackrecord |          0 | PRIMARY                                               |            1 | id                 | A         |     3708890 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_d7e6d55b                        |            1 | timestamp          | A         |     3708890 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_59fbc8df                        |            1 | ipaddress          | A         |      148355 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_9365d6e7                        |            1 | site_id            | A         |          84 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_4a0c0377                        |            1 | matched_inetnum_id | A         |       14835 |     NULL | NULL   | YES  | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_processed_2ab57daab18db765_uniq |            1 | processed          | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_8f635513                        |            1 | lead_id            | A         |       57951 |     NULL | NULL   | YES  | BTREE      |         |               |
| trackdata_trackrecord |          1 | trackdata_trackrecord_considered_cce21f93_uniq        |            1 | considered         | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------------+------------+-------------------------------------------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
8 rows in set (0.00 sec)

mysql> SHOW INDEX FROM trackdata_lead;
+----------------+------------+------------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table          | Non_unique | Key_name                                                         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+------------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| trackdata_lead |          0 | PRIMARY                                                          |            1 | id          | A         |       29727 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_lead |          1 | trackdata_le_company_id_b11050a8_fk_companydata_companyrecord_id |            1 | company_id  | A         |       14863 |     NULL | NULL   |      | BTREE      |         |               |
| trackdata_lead |          1 | trackdata_lead_site_id_d3b7ac3d_fk_customer_site_id              |            1 | site_id     | A         |          94 |     NULL | NULL   |      | BTREE      |         |               |
+----------------+------------+------------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

Is there something I can optimize to gain performance? Any help is highly appreciated.

Best Answer

SELECT  `id`,
        `company_name`,
        `status`,
        `company_uid`,
        `company_zipcode`,
        `company_city`,
        `company_canton`
FROM   `trackdata_lead` AS tl
WHERE  `site_id` = 41
  AND EXISTS( SELECT * FROM `trackdata_trackrecord` AS tr
                WHERE tl.id = tr.lead_id
                  AND `timestamp` >= '2016-12-20 23:00:00'
                  AND `timestamp`  < '2016-12-20 23:00:00' + INTERVAL 6 DAY );

Indexes:

trackdata_lead:   INDEX(site_id)
trackdata_trackrecord:  INDEX(lead_id, timestamp) -- in that order

Notes:

  • JOIN + DISTINCT (or GROUP BY) leads to an explosion or rows, followed by an implosion. By moving the JOIN to EXISTS, both can be eliminated (I think).
  • EXISTS stops when it finds any record, and does not need to look at all matches.
  • The subquery is 'covering' ("Using index"), hence very efficient
  • The rewriting of the timestamp condition is not for speed, but for avoiding some end cases and to avoid computing around leap days, etc.
  • A 'composite' index is not the same as two single-column indexes.
  • INDEX(lead_id) will now be redundant and should be DROPped.
  • There seem to be two 'flags' being indexed; generally such indexes are never used.
  • The EXPLAIN suggests that about 4269*64 rows will be looked at (and put into a tmp table). My formulation should touch only about 4269*2 rows -- scan 4269 rows in one table, plus make 4269 probes into the other. So, I would expect a significant speedup.