Mysql – How to speed up a DateTime comparison query in MySQL

datetimeindexMySQL

I am performing the following query on a MySQL table with around a million records ( ~350MB ). It takes roughly 1.5 seconds every time. If I remove the OR last_updated > last_geocoded comparison, the query time improves by an order of magnitude, to less than 100ms. If I remove the last_geocoded = 0 line, the query time stays at roughly 1.5 seconds.

SELECT id, company_id, raw_address
FROM locations
WHERE
    raw_address IS NOT NULL
    AND (
        last_geocoded = 0
        OR last_updated > last_geocoded
    )
LIMIT 25

If I reduce the query to just WHERE last_updated > last_geocoded, the query time is less than 100ms.

SELECT id, company_id, raw_address
FROM locations
WHERE last_updated > last_geocoded
LIMIT 25

How can I speed up the original query? Why is that WHERE statement slowing down my query so much?

Below is the table schema showing my keys:

CREATE TABLE `locations` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `company_id` int(11) unsigned DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL,
  `description` text,
  `raw_address` varchar(255) DEFAULT NULL,
  `street` varchar(255) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  `city_code` varchar(255) DEFAULT NULL,
  `county` varchar(255) DEFAULT NULL,
  `county_code` varchar(255) DEFAULT NULL,
  `state` varchar(255) DEFAULT NULL,
  `state_code` varchar(255) DEFAULT NULL,
  `country` varchar(255) DEFAULT NULL,
  `country_code` varchar(255) DEFAULT NULL,
  `constituency` varchar(255) DEFAULT NULL,
  `constituency_code` varchar(255) DEFAULT NULL,
  `postal_code` int(5) unsigned zerofill DEFAULT NULL,
  `lat` float(10,6) DEFAULT NULL,
  `lon` float(10,6) DEFAULT NULL,
  `headquarters` tinyint(1) unsigned DEFAULT NULL,
  `research_and_development` tinyint(1) unsigned DEFAULT NULL,
  `manufacturing` tinyint(1) unsigned DEFAULT NULL,
  `distribution` tinyint(1) unsigned DEFAULT NULL,
  `sales` tinyint(1) unsigned DEFAULT NULL,
  `retail` tinyint(1) unsigned DEFAULT NULL,
  `source_id` int(11) unsigned DEFAULT NULL,
  `last_updated` datetime NOT NULL,
  `last_geocoded` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `company_id` (`company_id`),
  KEY `raw_address` (`raw_address`),
  KEY `last_updated` (`last_updated`),
  KEY `last_geocoded` (`last_geocoded`),
  KEY `city` (`city`),
  KEY `county` (`county`),
  KEY `state` (`state`),
  KEY `country` (`country`),
  KEY `postal_code` (`postal_code`)
) ENGINE=InnoDB AUTO_INCREMENT=997680 DEFAULT CHARSET=latin1;

Below is the original query run with EXPLAIN:

"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"SIMPLE","locations","range","raw_address,last_geocoded","raw_address",258,NULL,509543,"Using where"

Best Answer

Take the 100ms query and LEFT JOIN it back to the locations table:

SELECT B.id, B.company_id, B.raw_address FROM 
(
    SELECT id FROM locations
    WHERE last_updated > last_geocoded LIMIT 25
) A LEFT JOIN locations B USING (id)
WHERE B.raw_address IS NOT NULL;

Give it a Try !!!