Mysql – Baffled with MySQL LEFT JOIN query that I have to speed it up

MySQLpercona-serverperformance

I am kind of a puzzled with this query trying to optimize it but no way I am able to (due to my limited knowledge as a DBA). So, I asked the experts.

In my setup I have three (primary) tables. contacts, dnc_contacts (dnc = do not call) and callrequests. I have to select the contacts from contacts table, and copy the contact into callrequests table where contact is not equal to contacts from dnc_contacts. It all looks good so far, but since the data is growing, things are getting slower and slower. The original query I am currently using is:

INSERT IGNORE callrequests (contact_id, contact, call_status, campaign_id, phonebook_id, dnc_group_id, created_date, active_call, processed)
SELECT c.id,
       c.contact,
       1,
       '134',
       '70',
       '21',
       '2014-03-14 14:50:58',
       0,
       0
FROM contacts c
LEFT JOIN dnc_contacts d ON c.contact <> d.contact
WHERE phonebook_id = '70' AND d.group_id = '21'
ORDER BY c.id ON DUPLICATE KEY
UPDATE phonebook_id =
VALUES(phonebook_id), dnc_group_id =
VALUES(dnc_group_id), call_status = if(call_status = '2', VALUES(call_status), call_status)

This query is taking around 8 minutes and plus to complete depending on the data in contacts and dnc_contacts table. Schema for all these 3 tables is at http://sqlfiddle.com/#!2/1ec3a

Setup

The setup is a dedicated Xeon machine with 12 cores having 64 GB RAM and 38 GB InnoDB buffer pool size. The DB size is around 6 GB.

Callrequests table has around 5427519 records

Contacts table has around 20498053 records

Dnc_Contacts table has around 4042246 records

Here is the explain output from the select query above,

+----+-------------+-------+------+----------------------------------+------+---------+-------+---------+--------------------------+
| id | select_type | table | type | possible_keys                    | key  | key_len | ref   | rows    | Extra                    |
+----+-------------+-------+------+----------------------------------+------+---------+-------+---------+--------------------------+
|  1 | SIMPLE      | d     | ref  | fk_dnc_contact_dnc_group1_idx,v1 | v1   | 4       | const | 2014474 | Using index              |
|  1 | SIMPLE      | c     | ref  | fk_contact_phonebook1_idx,v1     | v1   | 4       | const | 2218560 | Using where; Using index |
+----+-------------+-------+------+----------------------------------+------+---------+-------+---------+--------------------------+

Is there a way the query can be speed up or maybe if we can change the logic either?

Best Answer

Your query does not do what you think it does. It results in much more rows than you want to because of the cross join. The only reason you haven't noticed it, is that the excessive number of rows produced are not really inserted in the 3rd table due to duplicate keys which trigger either the INSERT IGNORE or the ON DUPLICATE KEY UPDATE

(And you should really keep one of these two, there is absolutely no reason to have both IGNORE and ON DUPLICATE KEY UPDATE. MySQL will either ignore the duplicate keys or try to update the existing row. Does it make sense to ignore someone and talk to him at the same time?)

You probably need:

FROM contacts c
  LEFT JOIN dnc_contacts d 
    ON  d.contact = c.contact
    AND d.group_id = 21
WHERE c.phonebook_id = 70 
  AND d.contact IS NULL

Try the above (just SELECT first, without inserting), to see if it gives you the wanted results.

An index on dnc_contacts (group_id, contact) would help with efficiency.