Mysql – Problems updating with inner join with multiple on statements

join;MySQLupdate

I have an orders table on a database that contains geographical information about the city and state that the order is from. This includes latitude and longitude for doing haversine calculations.

I have a new geo-location database containing the latitude and longitude of said city states that differs slightly from my previous database. This introduces some odd behavior because when I run haversine calculations, mileage will not be calculated correctly due to the slight differences in latitude and longitude from the previous database.

What I need to do is update the latitude and longitude on the orders table to match what occurs in the geo-location database. This is based off the city and state that is on the order. I am trying to update my orders table like so:

update ProcurementPortal.orders as orders
inner join 
ProcurementPortal.cities_extended as geo 
on orders.city = geo.city 
and orders.state = geo.state_code
set
orders.lat = geo.latitude,
orders.lon = geo.longitude
where
orders.city = geo.city
and
orders.state = geo.state_code

The problem I am having is that MysqlWorkbench gives me the following error.

Error Code: 2013. Lost connection to MySQL server during query

Is the syntax I provided correct? Would it update the orders how I expect it to? I can run this from the command line, but it runs for a long time and I'm not sure it's really doing what it's supposed to be doing.

I'm fairly new to mysql, so some clarification on how I could accomplish this would be wonderful.

Output of SHOW CREATE TABLE orders – note there are some garbage columns on this. Needs a lot of cleaning up. The city / state column I refer to in posted query is oCity and oState:

CREATE TABLE `orders` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned DEFAULT NULL,
  `company_id` int(10) unsigned DEFAULT NULL,
  `action_menu` text COLLATE utf8_unicode_ci,
  `oCity` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `oState` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  -- many other columns
  `latitude` double(8,2) DEFAULT NULL,
  `longitude` double(8,2) DEFAULT NULL,
  -- more columns
  PRIMARY KEY (`id`),
  KEY `orders_company_id_foreign` (`company_id`),
  CONSTRAINT `orders_company_id_foreign` 
    FOREIGN KEY (`company_id`) 
    REFERENCES `companies` (`id`) ON 
    DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=70120 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Output of SHOW CREATE TABLE cities_extended:

CREATE TABLE `cities_extended` (
  `city` varchar(50) NOT NULL,
  `state_code` char(2) NOT NULL,
  `zip` int(5) unsigned zerofill NOT NULL,
  `latitude` double NOT NULL,
  `longitude` double NOT NULL,
  `county` varchar(50) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Running the query in question from the command line runs for a while, then returns the following error:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

^ this might be due to the process still running. I killed the process and it's actually going through now. Just waiting to see if I get the same 2013 error.

Best Answer

First, you have conditions duplicated in the ON and the WHERE clause. That is not needed, the condition are only needed once, preferably in the ON clause as they are used for the join of the two tables. But that is not what's causing the slow execution (and the lost connections).

The reasons for being slow are three:

  • you are joining a 63K with a 363K table without any index to help. That is the major issue.

  • the columns used in the join do not have same type, length, character set and collation. This is another major issue, worse than the first. Even if you add indexes, but keep the same types, the indexes will not be of much help. More seriously, it's a question if you can / are allowed to change the types or the applications depend on the existing types.

  • you are updating potentially the whole table. This is rather minor as the table is small, only 63K rows. We'll see if that is still a problem, after we fix the major issues.

The first thing that needs addressing is the types (+ lengths + charsets + collations) of the two columns. Assuming that you are allowed to change the types of the table order, you should change the types (both varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL) to match the types of the geo table (one varchar(50), the other char(2), both CHARSET=latin1, unknown/default collation.)

This will be tricky and perhaps error prone, so I suggest you first take a backup of the whole database, or just the table orders, before you attempt to change anything.

If I were you, I'd then first (before changing anything), check the data in these two columns and whether their maximum length is less than 50 and 2 respectively, to be sure that the truncation of the length will not change any valuable data.

I'd also check the collation of the cities_extended.city and cities_extended.state_code columns, it's probably the default latin1_swedish_ci.

(Of course these two steps above can be skipped - but only if you don't really care about the orders table and you can recreate it from scratch if needed.)

After everything is double checked, we can proceed with the actual changing of the types:

ALTER TABLE ProcurementPortal.orders
    MODIFY COLUMN oCity 
        varchar(50) 
        CHARSET 'latin1'
        -- COLLATE 'latin1_swedish_ci'
        DEFAULT NULL,
    MODIFY COLUMN oState 
        char(2) 
        CHARSET 'latin1' 
        -- COLLATE 'latin1_swedish_ci'
        DEFAULT NULL ;

(the COLLATE 'latin1_swedish_ci' are probably not needed, the defaults will be used, so they are commented out.)

If that goes well and there are no errors, we can proceed with the final step, which is to add the indexes:

Add two indexes, one on each table:

ALTER TABLE ProcurementPortal.orders
   ADD INDEX state_city_ix
       (oState, oCity) ;

ALTER TABLE ProcurementPortal.cities_extended
   ADD INDEX state_city_ix
       (state_code, city) ;

After that (it will take a while but you only need to do this once), you can proceed with the update statement.

If it's still taking too long, I'd consider making the update in batches (of say 1K-5K rows per time). But with so small tables, I don't think you need that.