I am trying to solve this logic problem but my current MySQL schema is giving me a very poor performance with a large dataset.
I have a table which contains almost half a billion telephone records. They are called the Do Not Call list or simply the DNC.
When I start my campaign to dial numbers, I have to check against this list to see if my phone numbers are there in the list or not. If yes, then scrub from the list and do not dial that number.
Here is my dnc table:
CREATE TABLE `dnc_contacts` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`group_id` INT(11) NOT NULL,
`contact` VARCHAR(45) NOT NULL,
`status` TINYINT(1) NOT NULL DEFAULT '1',
`created_date` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
`admin` TINYINT(1) NOT NULL DEFAULT '0',
`updated_date` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`,`group_id`),
KEY `fk_dnc_contact_dnc_group1_idx` (`group_id`),
KEY `idx-group_id-contact` (`group_id`,`contact`),
CONSTRAINT `dnc_contacts_copy_ibfk_1` FOREIGN KEY (`group_id`) REFERENCES `dnc_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The query I am using to compare the dnc table is:
UPDATE contacts c LEFT JOIN dnc_contacts d ON d.contact = c.contact AND d.group_id = '9' SET campaign_id = '525', call_status = 1, active_call = 0, dnc_contact = if(d.contact IS NULL, 0, 1) WHERE phonebook_id IN (920)
My contacts table looks like:
CREATE TABLE `contacts` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`phonebook_id` int(11) NOT NULL,
`campaign_id` int(11) DEFAULT NULL,
`contact` varchar(45) NOT NULL,
`status` tinyint(1) unsigned NOT NULL DEFAULT '1',
`admin` tinyint(1) unsigned NOT NULL DEFAULT '0',
`call_status` tinyint(1) unsigned NOT NULL DEFAULT '1',
`active_call` tinyint(1) unsigned NOT NULL DEFAULT '0',
`test_call` tinyint(1) unsigned NOT NULL DEFAULT '0',
`dnc_contact` tinyint(1) unsigned NOT NULL DEFAULT '0',
`v_node_name` varchar(45) DEFAULT NULL,
`updated_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`created_date` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`,`phonebook_id`),
KEY `fk_contact_phonebook1_idx` (`phonebook_id`),
KEY `idx-phonebook_id-contact` (`id`,`contact`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Running this query for small datasets with 100K to 200K contacts works fine, but with large dataset with more than 300 million records kills the performance. Can't think of a better way to achieve the goal.
Appreciate any advice to improve the logic/schema/query.
The contact
is not unique. I can have the same contact
but under a different group
or a phonebook
. It gets even messier when there are multiple phonebooks
. active_call
and call_status
keep on changing as the numbers are dialled.
Best Answer
Isn't
contact
unique in both tables? If so, get rid ifid
and havePRIMARY KEY(contact)
. There are multiple reasons why this is likely to speed things up.active_call
andcall_status
sound like things that are changed often? That interferes with the setting of thednc_contact
, especially when done in large chunks.I can think of no good reason to even have
dnc_contact
incontacts
. It is redundant information. That is, when you are about to make a call, do theJOIN
to verify that the phone number is not indnc_contact
.If you really need to do the
UPDATE
, here are tips on how to do it is a relatively non-invasive way -- bu chunking.