For the following table structure in MySQL 5.1.49:
CREATE TABLE `leads` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`institution_id` int(10) unsigned NOT NULL,
`lender_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`client_id` int(10) unsigned NOT NULL,
`contract_id` int(10) unsigned DEFAULT NULL,
`employee_id` int(11) unsigned DEFAULT NULL,
`create_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`inquiry_date` timestamp NULL DEFAULT NULL,
`claimed_date` timestamp NULL DEFAULT NULL,
`refunded` timestamp NULL DEFAULT NULL,
`price` decimal(10,2) unsigned NOT NULL,
`downloaded` int(11) NOT NULL DEFAULT '0',
`status` enum('in_review','declined','pre-approved') DEFAULT NULL,
`amount` bigint(20) DEFAULT NULL,
`response` longtext,
`pushed` tinyint(1) NOT NULL DEFAULT '0',
`priority` tinyint(2) NOT NULL,
`disabled` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `unq_lead` (`institution_id`,`client_id`,`product_id`),
KEY `status` (`status`),
KEY `fk-product-leads` (`product_id`),
KEY `fk-contract-leads` (`contract_id`),
KEY `fk-lender-leads` (`lender_id`),
KEY `fk-client-leads` (`client_id`),
KEY `fk-employee-leads` (`employee_id`),
CONSTRAINT `fk-client-leads` FOREIGN KEY (`client_id`) REFERENCES `clients` (`id`) ON DELETE CASCADE,
CONSTRAINT `fk-contract-leads` FOREIGN KEY (`contract_id`) REFERENCES `contracts` (`id`),
CONSTRAINT `fk-employee-leads` FOREIGN KEY (`employee_id`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
CONSTRAINT `fk-institution-leads` FOREIGN KEY (`institution_id`) REFERENCES `institutions` (`id`) ON DELETE CASCADE,
CONSTRAINT `fk-lender-leads` FOREIGN KEY (`lender_id`) REFERENCES `lenders` (`id`),
CONSTRAINT `fk-product-leads` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5930472 DEFAULT CHARSET=utf8
Is there any way to further optimize this query:
SELECT *
FROM leads
INNER JOIN (
SELECT MIN(id) AS 'min_id', `leads`.`institution_id`, `leads`.`client_id`
FROM `leads`
WHERE (claimed_date IS NOT NULL)
AND institution_id = 224
GROUP BY `institution_id`,`client_id`
HAVING (COUNT(*) > 0)
) AS `cl` ON leads.institution_id = cl.institution_id AND leads.client_id = cl.client_id AND leads.id != cl.min_id
WHERE (leads.disabled = 0);
These are the results I get currently from EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL 5832
1 PRIMARY leads ref unq_lead,fk-client-leads,fk-institution-leads,test unq_lead 8 cl.institution_id,cl.client_id 1 Using where
2 DERIVED leads ref unq_lead,fk-institution-leads unq_lead 4 6013 Using where
UPDATE
So I have rewritten the query so that it appears like so:
SELECT *
FROM leads
INNER JOIN (
SELECT MIN(id) AS `min_id`, `leads`.`institution_id`, `leads`.`client_id`
FROM `leads`
WHERE (claimed_date IS NOT NULL)
AND (institution_id = 224)
GROUP BY `client_id`
) AS `cl` ON leads.institution_id = cl.institution_id AND leads.client_id = cl.client_id AND leads.id <> cl.min_id
WHERE (leads.disabled = 0)
AND (leads.institution_id = 224);
I have also tried adding the following additional indexes:
ALTER TABLE leads
ADD INDEX test (disabled, institution_id, client_id),
ADD INDEX test2 (claimed_date, institution_id, client_id);
But I still get the following from EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL 5718 Using where
1 PRIMARY leads ref unq_lead,fk-client-leads,test unq_lead 8 const,cl.client_id 1 Using where
2 DERIVED leads ref unq_lead,test2 unq_lead 4 12304 Using where
Best Answer
Remove the
HAVING COUNT(*) > 0
. It's useless, no row will have a count of0
after a group by.Change the
GROUP BY
to:GROUP BY client_id
. Grouping byinstitution_id
is not needed, you already have aWHERE
condition that narrows it one value.As @HLGEM suggested, remove the
select *
and use a list of fields that you need. Right now you are repeating data in theclient_id
field and that is wasteful of server and network resources.So the query becomes:
Then add an index on
(claimed_date, institution_id, client_id)
to speed up the nested subquery.If that doesn't really improve the speed much, I think an index on
(disabled, institution_id, client_id)
would help the joining.You could also rewrite the query as: