Mysql – Help Optimizing Query

MySQLmysql-5mysql-5.1optimization

For the following table structure in MySQL 5.1.49:

CREATE TABLE `leads` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `institution_id` int(10) unsigned NOT NULL,
  `lender_id` int(10) unsigned NOT NULL,
  `product_id` int(10) unsigned NOT NULL,
  `client_id` int(10) unsigned NOT NULL,
  `contract_id` int(10) unsigned DEFAULT NULL,
  `employee_id` int(11) unsigned DEFAULT NULL,
  `create_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `inquiry_date` timestamp NULL DEFAULT NULL,
  `claimed_date` timestamp NULL DEFAULT NULL,
  `refunded` timestamp NULL DEFAULT NULL,
  `price` decimal(10,2) unsigned NOT NULL,
  `downloaded` int(11) NOT NULL DEFAULT '0',
  `status` enum('in_review','declined','pre-approved') DEFAULT NULL,
  `amount` bigint(20) DEFAULT NULL,
  `response` longtext,
  `pushed` tinyint(1) NOT NULL DEFAULT '0',
  `priority` tinyint(2) NOT NULL,
  `disabled` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unq_lead` (`institution_id`,`client_id`,`product_id`),
  KEY `status` (`status`),
  KEY `fk-product-leads` (`product_id`),
  KEY `fk-contract-leads` (`contract_id`),
  KEY `fk-lender-leads` (`lender_id`),
  KEY `fk-client-leads` (`client_id`),
  KEY `fk-employee-leads` (`employee_id`),
  CONSTRAINT `fk-client-leads` FOREIGN KEY (`client_id`) REFERENCES `clients` (`id`) ON DELETE CASCADE,
  CONSTRAINT `fk-contract-leads` FOREIGN KEY (`contract_id`) REFERENCES `contracts` (`id`),
  CONSTRAINT `fk-employee-leads` FOREIGN KEY (`employee_id`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
  CONSTRAINT `fk-institution-leads` FOREIGN KEY (`institution_id`) REFERENCES `institutions` (`id`) ON DELETE CASCADE,
  CONSTRAINT `fk-lender-leads` FOREIGN KEY (`lender_id`) REFERENCES `lenders` (`id`),
  CONSTRAINT `fk-product-leads` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5930472 DEFAULT CHARSET=utf8

Is there any way to further optimize this query:

SELECT * 
FROM leads
INNER JOIN (
    SELECT MIN(id) AS 'min_id', `leads`.`institution_id`, `leads`.`client_id`
    FROM `leads`
    WHERE (claimed_date IS NOT NULL)
    AND institution_id = 224
    GROUP BY `institution_id`,`client_id`
    HAVING (COUNT(*) > 0)
) AS `cl` ON leads.institution_id = cl.institution_id AND leads.client_id = cl.client_id AND leads.id != cl.min_id
WHERE (leads.disabled = 0);

These are the results I get currently from EXPLAIN:

id  select_type  table       type  possible_keys                                       key       key_len  ref                             rows  Extra        
1   PRIMARY      <derived2>  ALL                                                                                                          5832               
1   PRIMARY      leads       ref   unq_lead,fk-client-leads,fk-institution-leads,test  unq_lead  8        cl.institution_id,cl.client_id  1     Using where  
2   DERIVED      leads       ref   unq_lead,fk-institution-leads                       unq_lead  4                                        6013  Using where 

UPDATE

So I have rewritten the query so that it appears like so:

SELECT *
FROM leads
INNER JOIN (
SELECT MIN(id) AS `min_id`, `leads`.`institution_id`, `leads`.`client_id`
FROM `leads`
WHERE (claimed_date IS NOT NULL)
AND (institution_id = 224)
GROUP BY `client_id`
) AS `cl` ON leads.institution_id = cl.institution_id AND leads.client_id = cl.client_id AND leads.id <> cl.min_id
WHERE (leads.disabled = 0)
AND (leads.institution_id = 224);

I have also tried adding the following additional indexes:

ALTER TABLE leads
    ADD INDEX test (disabled, institution_id, client_id),
    ADD INDEX test2 (claimed_date, institution_id, client_id);

But I still get the following from EXPLAIN:

id  select_type  table       type  possible_keys                  key       key_len  ref                 rows   Extra        
1   PRIMARY      <derived2>  ALL                                                                         5718   Using where  
1   PRIMARY      leads       ref   unq_lead,fk-client-leads,test  unq_lead  8        const,cl.client_id  1      Using where  
2   DERIVED      leads       ref   unq_lead,test2                 unq_lead  4                            12304  Using where  

Best Answer

  • Remove the HAVING COUNT(*) > 0. It's useless, no row will have a count of 0 after a group by.

  • Change the GROUP BY to: GROUP BY client_id. Grouping by institution_id is not needed, you already have a WHERE condition that narrows it one value.

  • As @HLGEM suggested, remove the select * and use a list of fields that you need. Right now you are repeating data in the client_id field and that is wasteful of server and network resources.

So the query becomes:

SELECT le.*                  --- only the fields you need here
                             --- for example `institution_id` is 224, so
                             --- there is no need to include that
FROM leads AS le
  INNER JOIN (
    SELECT MIN(id) AS min_id, institution_id, client_id
    FROM leads 
    WHERE claimed_date IS NOT NULL
      AND institution_id = 224
    GROUP BY client_id
  ) AS cl 
      ON  le.institution_id = cl.institution_id 
      AND le.client_id = cl.client_id 
      AND le.id <> cl.min_id
WHERE le.disabled = 0 ;
  • Then add an index on (claimed_date, institution_id, client_id) to speed up the nested subquery.

  • If that doesn't really improve the speed much, I think an index on (disabled, institution_id, client_id) would help the joining.

You could also rewrite the query as:

SELECT le.* 
FROM leads AS le
  INNER JOIN (
    SELECT MIN(id) AS min_id, client_id
    FROM leads 
    WHERE claimed_date IS NOT NULL
      AND institution_id = 224
    GROUP BY client_id
  ) AS cl 
      ON  le.client_id = cl.client_id 
      AND le.id <> cl.min_id
WHERE le.disabled = 0 
  AND le.institution_id = 224;