MySQL update by inner join no possible keys

amazon-rdsjoin;MySQLupdate

Alright so I'm having trouble getting this big honking query to use indexes, not sure what's up first time trying to do an update by way of an inner join so I'm sure I'm missing something.

The use case: I have a system setup where we are loading hundreds of thousands of records into a table (campaign_inventory) multiple times a day overwriting the current data for a particular xml_feed. It works well enough even if it is kinda slow. But when you're moving around 800,000 to a 1,000,000 records all day kinda slow is good enough. Well now we're needing to geocode this data. Can't do it when loading the data because it takes way too long to generate the lat and lon when parsing the xml feeds. Additionally most records don't have zip codes so we have to use city and state.

So I have a table of zip code info containing the lat and lon for each zip, city, and state. I'm trying to update all the records for a feed at once by way of a inner join query.

The problem: The query takes like 30 seconds for 400 records. (Some feeds are over 400,000 records.) And explain shows no possible_keys for the zip_codes table.

The Tables:
So we have two tables, campaign_inventory and zip_codes. Here's the create querys:

campaign_inventory:

CREATE TABLE `campaign_inventory` (
  `id` binary(16) NOT NULL,
  `item_name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `enabled` int(11) DEFAULT NULL,
  `item_type` varchar(45) CHARACTER SET latin1 DEFAULT NULL,
  `inventory_copy` text COLLATE utf8mb4_unicode_ci,
  `campaign_partner_id` int(11) NOT NULL,
  `campaign_id` int(11) NOT NULL,
  `updated_at` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP,
  `created_at` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP,
  `campaign_xml_feed_id` int(11) DEFAULT NULL,
  `city` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `state` varchar(2) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `zip` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `lat` double DEFAULT NULL,
  `lon` double DEFAULT NULL,
  PRIMARY KEY (`id`,`campaign_partner_id`,`campaign_id`),
  KEY `id` (`id`) USING BTREE,
  KEY `fk_campaign_xml_feed_id_campaign_inventory` (`campaign_xml_feed_id`) USING BTREE,
  KEY `search_index` (`campaign_id`,`city`(191),`state`,`zip`) USING BTREE,
  KEY `new_search` (`id`,`item_name`(191),`inventory_copy`(100),`created_at`),
  FULLTEXT KEY `inventory_copy` (`inventory_copy`,`item_name`),
  CONSTRAINT `campaign_inventory_ibfk_2` FOREIGN KEY (`campaign_xml_feed_id`) REFERENCES `campaign_partner_xml_feeds` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

zip_codes:

CREATE TABLE `zip_codes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `zip_code` varchar(5) NOT NULL,
  `city` varchar(255) NOT NULL,
  `state` varchar(2) NOT NULL,
  `lat` double NOT NULL,
  `lon` double NOT NULL,
  PRIMARY KEY (`id`),
  KEY `city and state` (`city`,`state`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=latin1

The Query:

UPDATE campaign_inventory
INNER JOIN zip_codes ON campaign_inventory.city = zip_codes.city
AND campaign_inventory.state = zip_codes.state
SET campaign_inventory.lat = zip_codes.lat,
 campaign_inventory.lon = zip_codes.lon
WHERE
    campaign_inventory.campaign_xml_feed_id = ?
AND campaign_inventory.lat IS NULL

Explain:

+----+-------------+--------------------+------+--------------------------------------------+--------------------------------------------+---------+-------+-------+----------------------------------------------------+
| id | select_type | table              | type | possible_keys                              | key                                        | key_len | ref   | rows  | Extra                                              |
+----+-------------+--------------------+------+--------------------------------------------+--------------------------------------------+---------+-------+-------+----------------------------------------------------+
|  1 | SIMPLE      | campaign_inventory | ref  | fk_campaign_xml_feed_id_campaign_inventory | fk_campaign_xml_feed_id_campaign_inventory | 5       | const |   368 | Using where                                        |
|  1 | SIMPLE      | zip_codes          | ALL  | NULL                                       | NULL                                       | NULL    | NULL  | 80034 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+--------------------+------+--------------------------------------------+--------------------------------------------+---------+-------+-------+----------------------------------------------------+

So how the heck do I optimize this thing? Any suggestions on other ways to handle geocoding at this scale? Do I just need a super-computer of a mysql server to handle it? If it makes any difference the server is MySQL 5.6.23 running in Amazon RDS on a db.m4.large instance.

Thanks to all who take time to help out with this. I appreciate it.

Best Answer

I think that for indices to be used you need to use the same charset and collation in both tables. Maybe this is not mandatory, but I guess it will save a lot of "conversion time".

You have DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci on one of them, whereas you have DEFAULT CHARSET=latin1 in the other. In the campaign_inventory, you also specifically specify CHARSET and COLLATION for columns state and city.

I think you could try ALTER TABLE zip_codes CONVERT TO CHARACTER SET utf8mb4 and see if this is having some effect.

You also need to check if the length of your varchars is the proper one. The standard maximum index entry size in MySQL/InnoDB is 767 bytes, and, using UTF8MB4, you'd exceed it ( (255+2) x 4 > 767).

If you set your tables in the following way, this enables the usage of the index:

CREATE TABLE `campaign_inventory` 
(
  `id`                   binary(16) NOT NULL,
  `item_name`            varchar(255) DEFAULT NULL,
  `enabled`              int(11) DEFAULT NULL,
  `item_type`            varchar(45) DEFAULT NULL,
  `inventory_copy`       text ,
  `campaign_partner_id`  int(11) NOT NULL,
  `campaign_id`          int(11) NOT NULL,
  `updated_at`           datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP,
  `created_at`           datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP,
  `campaign_xml_feed_id` int(11) DEFAULT NULL,
  `state`                character(2)  DEFAULT NULL,
  `city`                 varchar(50) DEFAULT NULL,
  `zip`                  varchar(5) DEFAULT NULL,
  `lat`                  double DEFAULT NULL,
  `lon`                  double DEFAULT NULL,

  PRIMARY KEY (`id`,`campaign_partner_id`,`campaign_id`),
  KEY `id` (`id`) USING BTREE,
  KEY `fk_campaign_xml_feed_id_campaign_inventory` (`campaign_xml_feed_id`) USING BTREE,
  KEY state_city (state, city),
  KEY `search_index` (`campaign_id`,`city`(50),`state`,`zip`) USING BTREE,
  KEY `new_search` (`id`,`item_name`(191),`inventory_copy`(100),`created_at`),
  FULLTEXT KEY `inventory_copy` (`inventory_copy`,`item_name`),

and

  CONSTRAINT `campaign_inventory_ibfk_2` FOREIGN KEY (`campaign_xml_feed_id`) 
    REFERENCES `campaign_partner_xml_feeds` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
)  ENGINE = InnoDB 
   DEFAULT CHARACTER SET = utf8mb4 
   COLLATE = utf8mb4_unicode_ci  ;

CREATE TABLE `zip_codes` (
  `id`           int(11) NOT NULL AUTO_INCREMENT,
  `zip_code`     varchar(5) NOT NULL,
  `state`        character(2) NOT NULL,
  `city`         varchar(50) NOT NULL,
  `lat`          double NOT NULL,
  `lon`          double NOT NULL,

  PRIMARY KEY (`id`),
  KEY state_city (state, city)
) ENGINE = InnoDB 
  DEFAULT CHARACTER SET = utf8mb4 
  COLLATE = utf8mb4_unicode_ci ;

You can check your original situation and this one it at RexTester.